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Chapter  1 


Introduction 


1.1  Problem  Overview 

Most  practical  problems  encountered  by  operations  research  practitioners 
intrinsically  possess  some  level  of  uncertainty  within  their  own  parameters. 
While  stochastic-based  modeling  techniques  inherently  accommodate  such 
randomness,  they  typically  cannot  prescribe  an  optimal  course  of  action.  By 
contrast,  well-defined  optimization  methods  can  find  an  optimal  solution,  but  only 
under  the  simplifying  assumption  of  deterministic  parameters  (usually  expected 
values).  Unfortunately,  neither  modeling  paradigm  can  —  by  itself  —  give  a 
decision  maker  the  optimal  course  of  action  that  hedges  against  extreme 
realizations  of  the  random  parameters. 

Long  recognizing  this  deficiency,  researchers  from  both  camps  have 
attempted  to  bridge  the  two  by  incorporating  elements  of  one  within  the 
operational  framework  of  the  other.  For  stochastic-oriented  methods,  this 
migration  has  occurred  primarily  under  numerical  approximation  methods, 
typically  by  applying  optimization  techniques  such  as  gradient  search  and 
response  surface  analysis  to  a  simulation  model  of  the  problem.  For  mathematical 
programming-based  algorithms,  incorporating  stochastic  elements  has  led  to 
algorithms  involving  parameter  approximations,  scenario  representation  through 
large-scale  models,  iterative  sampling,  and  parallel  processing.  In  both  cases,  the 
computational  requirements  can  be  formidable. 
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It  is  in  this  spirit  that  this  dissertation  offers  a  new  approach  to  analyzing 
the  well-defined  problem  of  stochastic  programming  with  recourse.  Strictly 
speaking,  as  a  Monte  Carlo  simulation  this  research  falls  in  the  stochastic-based 
category  of  modeling  techniques.  However,  the  underlying  principles  that 
motivate  the  simulation  come  from  linear  programming  theory;  mathematical 
programming  literature  dominates  previous  research  in  recourse  optimization 
problems;  and,  most  recourse  problems  available  for  comparison  purposes  have 
been  solved  using  LP-based  methods.  Consequently,  this  chapter  and  the 
following  literature  review  will  initially  proceed  from  the  mathematical 
programming  perspective. 

1.2  Problem  Description 

The  literature  offers  a  diverse  class  of  problems  under  the  category  of 
stochastic  linear  programming  (SLP)  based  on  several  properties. 

1.  Random  Parameter  Location.  This  characteristic  of  SLP  concerns  which 
parameters  of  the  model  (objective,  constraint  matrices,  or  right-side 
vectors)  are  random  variables. 

2.  Probabilistic  Representation.  This  aspect  of  SLP  concerns  the  modeling 
issue  of  the  random  variables;  i.e.,  their  portrayal  either  through 
probability  distribution  functions  or  by  representative  scenarios. 
Furthermore,  this  issue  addresses  distributional  assumptions  of  type  and 
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parameter,  or  in  the  case  of  scenario  modeling  the  sampling  of  a  wide 
variety  of  possible  outcomes. 

3.  Structural  Assumptions.  This  feature  of  SLP  relates  to  the  number  of  time 
periods  associated  with  the  realization  of  the  random  variables;  type  and 
availability  of  recourse  available  to  the  decision-maker  after  such 
realizations;  chance  constraints;  and,  assumptions  on  the  separability  of 
decision  variables  or  scenarios. 

4.  Constraint  Flexibility.  This  form  of  SLP  allows  for  the  possibility  of 
constraint  violation,  either  through  predetermined  levels  of  probability  or 
through  explicit  error  vectors  to  account  for  infeasibility. 

One  category  of  SLP  that  was  recognized  and  reviewed  early  on  (Dantzig  1955), 
and  has  continually  reappeared  in  the  literature,  is  the  two-stage  problem  with 
fixed  and  complete  recourse: 

Min  Z(x)  =  cx  +  E[h(x,©,T)] ,  s.t.  Ax  =  b,  x  >  0,  (1.1a) 


where 

/i(x,co,T)  =  Min  dy,  s.t.  Wy  =  co  -  Tx,  y  >  0,  (1.1b) 

and  to  and  T  contain  one  or  more  elements  that  are  random  variables  with 
prescribed  distributions.  This  form  of  stochastic  programming  represents  many 
practical  problems  where  the  uncertainty  lies  in  resource  demand  or  consumption; 
and,  in  recent  years  has  been  the  research  focus  regarding  advanced 
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decomposition  algorithms  involving  Monte  Carlo  sampling,  statistical  measures 
of  CD  and  T,  and  convergence.  However,  the  literature's  approach  to  solving  (1.1) 
has  exclusively  focused  on  finding  Min  Z(x),  while  little  has  been  reported 
regarding  solution  sensitivity  or  the  underlying  distribution  with  respect  to 
/i(x,co,T).  As  a  practical  matter,  though,  such  analysis  provides  the  decision¬ 
maker  with  the  insight  to  choose  a  solution  that  incorporates  subjective 
assessments  that  are  not  explicitly  modeled.  Furthermore,  the  distributional 
properties  of  (1.1b)  characterize  the  behavior  of  the  recourse  function  in  a  way 
that  gives  the  decision-maker  insight  beyond  the  sole  criteria  of  expected  value. 

Justification  for  this  line  of  inquiry  begins  with  the  three  primary 
characteristics  known  to  be  true  for  (1.1). 

1.  Decision  Variables.  The  structure  of  (1.1)  dictates  that  x  represents  the 
only  true  decision  vector;  once  made,  the  recourse  problem  (1.1b) 
becomes  a  deterministic  function  of  x  and  the  realization  of  the  random 
variables  in  T  and  o>.  In  a  simulation  context,  this  result  places  x  as  the 
independent  variable  and  Z(x)  as  the  dependent  response.  Furthermore, 
preliminary  and  past  research  provides  empirical  evidence  that  the  region 
around  the  optimal  solution  x*  is  often  'flat';  i.e.,  for  some  small  epsilon 
value  e  >  0  there  exists  a  considerable  range  of  decision  variables  x'  such 
that  Z(x')  =  Z(x*)  +  e  (including  multiple  optimal  solutions;  i.e.  8  =  0). 

2.  Convexity.  The  literature  shows  that  Z(x)  is  a  piecewise  linear  convex 
function  of  x,  a  property  that  holds  several  important  ramifications  from  a 
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simulation  perspective.  First,  such  convexity  suggests  approximating  Z(x) 
with  a  quadratic  polynomial  formulation  of  x.  Second,  a  quadratic 
assumption  greatly  reduces  the  size  of  the  experimental  design  used  to 
collect  the  data.  Finally,  knowing  the  general  shape  of  the  response  allows 
for  an  independent  verification  of  the  validity  of  the  polynomial 
approximation  using  its  eigenvalues  —  a  capability  generally  not  available 
in  response  surface  methodology  due  to  the  unknown  underlying 
functional  form. 

3.  Distributional  Form.  Both  the  literature  and  preliminary  research  show 
that  the  distribution  of  h(x, CD,T)  varies  with  x.  Consequently,  there  may  be 
cases  where  x*  is  not  be  the  best  answer  if  the  distribution  of  /z(x*,co,T)  is 
less  favorable  in  certain  aspects  (e.g.,  form,  variance,  range)  relative  to 
other  parts  of  the  feasible  region. 

Therefore,  this  dissertation  proposes  expanding  the  analysis  of  (1.1)  beyond  just 
finding  the  optimal  solution  by  (1)  deriving  low-order  polynomial  approximations 
to  Z(x)  in  the  region  of  optimality,  and  (2)  comparing  important  characteristics  of 
the  distributions  of  h(x, co,T)  for  those  values  of  x  in  the  region  of  optimality. 

This  research  will  focus  on  a  special  and  important  class  of  (1.1)  —  the 
capacity  expansion  problem  —  and  examine  its  results  on  a  collection  of 
problems  provided  by  Morton  (1994c)  and  downloaded  through  the  Internet 
(http://www.engin.umich.edu/~dholmes).  Table  1.1  gives  a  description  of  the  size 
of  the  problem  sets  denoted  as  APL1P,  PGP2,  CEP1,  4TERM,  and  20TERM. 
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Table  1.1 

Problem  Set  Descriptions 


Problem 

Set 

#of  X 

Variables 

#  of  Random 
<o/T 

#of 

Scenarios 

Rows/Cols  in 

A+ 

Rows/Cols  in 

W 

APL1P 

2 

5 

1280 

2/2 

5/9 

PGP2 

4 

3 

576 

2/4 

7/16 

CEP1 

4 

3 

216 

9/8 

7/15 

4TERM 

15 

8 

256 

3/15 

28  / 146 

20TERM 

63 

40 

1.095  •  1012 

3/63 

124/764 

t  -  Does  not  include  upper  bounds  on  x. 


With  the  exception  of  APL1P,  Chapter  5  provides  the  detailed  description  and 
analysis  results  of  these  problems.  Chapters  3  and  4  use  APL1P  as  a 
demonstration  problem  due  to  its  small  size. 

1.3  Dissertation  Outline  and  Contributions 

Current  approaches  that  use  sampling  techniques  to  solve  (1.1)  share  the 
same  basic  philosophy  of  incorporating  such  methods  within  a  linear 
programming  framework.  However,  to  find  the  response  surface  approximations 
of  (1.1)  this  dissertation  proposes  a  different  modeling  framework  by  using  linear 
programming  results  within  a  Monte  Carlo  simulation  environment.  Although 
such  an  approach  provides  a  far  richer  analysis,  it  comes  at  the  expense  of 
increased  computational  burdens.  Consequently,  this  shift  in  modeling  paradigm 
requires  using  a  variety  of  techniques  to  efficiently  solve  recourse  problems,  and 
can  be  classified  in  two  categories:  Optimization  and  Statistical  Analysis. 
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Figure  1.1  breaks  these  two  categories  into  six  principal  topics  and  shows 
how  they  fit  in  the  overall  organization  of  this  dissertation.  This  research 
provides  its  major  contributions  within  these  topics  in  the  following  manner. 

1.  Search  Techniques.  This  dissertation  compares  three  different  optimal 
search  routines  —  Geometric  Simplex,  Projected  Gradient,  and  Parallel 
Tangents  —  for  accuracy  and  convergence.  The  simplex  and  parallel 
tangent  approach  have  not  been  tried  before  in  the  recourse  context,  while 
the  projected  gradient  method  has  been  discussed  in  the  general  case  of 
stochastic  quasigradient  methods. 

2.  Optimization  Algorithms.  Extending  previous  research  in  the  areas  of 
basis  classification,  this  thesis  completely  enumerates  the  optimal  bases 
for  either  the  entire  feasible  region  or  prescribed  subset.  This  new 
technique  offers  a  faster  way  to  calculate  estimates  of  Z(x),  and  for  smaller 
problems  offers  an  order  of  magnitude  increase  in  efficiency. 

3.  Variance  Reduction.  The  literature  reports  limited  use  of  variance 
reduction  techniques  (primarily  importance  sampling)  to  improve  the 
accuracy,  or  ease  the  computational  burden,  of  estimating  Z(x).  This 
research  investigates  two  techniques  —  Control  Variates  and  Latin 
Hypercube. 

4.  Experimental  Design.  This  topic  represents  the  first  known  attempt  to 
analyze  recourse  problems  using  experimental  design  methods  from  the 
simulation  and  engineering  literature. 
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Search  Optimization  Variance  Experimental  Response  Surface  Distribution 

Technique  Algorithm  Reduction  Design  Analysis  Analysis 
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Figure  1.1.  Dissertation  Overview 


5.  Response  Surface  Analysis.  This  topic  employs  Canonical  Analysis  as  the 
principal  method  for  quantifying  the  region  of  optimality,  and  directions  of 
minimum  and  maximum  sensitivity. 

6.  Distributional  Analysis.  Using  non-parametric  methods,  this  research 
introduces  this  concept  to  recourse  problem  analysis  by  using  Tolerance 
Limits  to  consider  high-cost  scenarios. 

As  shown  in  Figure  1.1,  Chapter  2  reviews  the  literature  on  stochastic  linear 
programming  and  simulation  optimization,  with  a  special  emphasis  on  solution 
techniques  for  two-stage  recourse  problems.  Chapter  3  presents  the  methodology 
and  underlying  theory  employed  by  the  proposed  solution  algorithms  and  search 
techniques.  Chapter  4  covers  the  statistical  analysis  topics  and  their  application  in 
the  context  of  (1.1).  Chapter  5  reviews  the  capacity  expansion  problem  sets  found 
in  the  literature  and  on  the  Internet,  and  provides  a  response  surface  and 
distributional  analysis  for  each  problem  using  the  proposed  algorithm.  Chapter  6 
summarizes  the  results  and  contributions  of  this  dissertation,  and  concludes  with 
proposals  for  future  research. 


9 


Chapter  2 


Literature  Review 


2.1  Introduction 

The  literature  on  stochastic  optimization  can  be  roughly  categorized  into 
two  broad  arenas  —  stochastic  programming  and  simulation.  This  dissertation 
proposes  solving  the  two-stage  recourse  problem  traditionally  considered  within 
the  domain  of  mathematical  programming  under  uncertainty  by  synthesizing 
results  and  techniques  developed  in  both  camps;  accordingly,  this  chapter  reviews 
previous  applicable  investigations  in  these  areas  in  the  following  manner.  Section 

2.2  surveys  the  different  modeling  variations  and  assumptions  found  in  the 
literature  under  the  general  heading  of  stochastic  linear  programming,  with 
particular  emphasis  on  the  category  of  interest  —  two-stage  programming  with 
recourse.  Section  2.3  reviews  proposed  solution  techniques  for  two-stage 
recourse  problems,  while  Section  2.4  closes  the  chapter  with  a  summary  of  the 
simulation  optimization  literature. 

2.2  Stochastic  Linear  programming 
2.2.1  General  Description 

Stochastic  linear  programming  was  first  explored  by  Dantzig  (1955)  as 
mathematical  programming  under  uncertainty,  and  independently  investigated  by 
Beale  (1955).  Since  then,  the  term  stochastic  programming  has  come  to 
encompass  a  very  broad  range  of  mathematical  programming  problems  where 
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uncertainty  exists  in  one  or  more  of  the  parameters.  Wets  distinguishes  between 
decision  making  under  uncertainty,  where  little  is  known  quantitatively  about  the 
uncertainty,  and  decision  making  under  risk  (or  stochastic  programming)  as 
problems  where  "...the  decision  maker  is  given  a  description  of  these  unknown 
parameters  in  terms  of  a  well-defined  probability  law  (Wets  1974)."  For  the 
purposes  of  this  research,  the  term  stochastic  linear  programming  (SLP)  follows 
Kail's  (1976)  definition  as  being  "...concerned  with  problems  arising  when  some 
or  all  coefficients  of  a  linear  program  are  stochastic  variables  with  known  (joint) 
probability  distribution." 

A  generalized  version  of  SLP  given  by  Ermoliev  and  Wets  shows  the 
difficulty  of  solving  stochastic  programs  with  recourse: 

find  xeXc zRN 
such  that  Ffx)  =  E{fi(x,a>)} 

=  j ' fi(x,co)P(d(0)  <0,  i  =  1, . . . ,  m 
z  =  F0(x)  =  E{fo(x,co)} 

=  jfo(x,6o)P(d(Q)  is  minimized  (2.1) 

where  (Q,A,P)  define  the  probability  space,  and  for  every  x  in  X,f,  fo,  and  all 
expectations  are  defined.  The  simplest  approach  for  solving  (2.1)  —  scenario 
analysis  —  finds  a  optimal  solution  x*((oj  for  a  given  scenario  (o’;  however, 
where  there  exists  of  (k  =  l, . . . ,  s)  scenarios  to  consider,  a  convex  combination 
ofx*(af)  (k  =  1, .  .  . ,  s)  may  prove  to  be  infeasible  (Ermoliev  and  Wets  1988). 
Scenario  Optimization,  as  proposed  by  Dembo  (1989),  extends  this  approach  to 
linear  forms  of  (2.1)  by  compensating  for  such  possible  infeasibilities  by  solving 
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(2.1)  for  each  scenario  k,  then  optimizing  a  tracking  or  coordination  model  that 
minimizes  some  function  of  the  differences  in  optimality  and  feasibility;  e.g., 

s  s 

Min  X  PjfcllCfcX  -  v*ll2  +  X  P/fcHAjx  -  b*!!2 

k= 1  &=1 

s.t.  AdX  =  bd,  x  >  0  (2.2) 

where  A^x  =  b*  characterizes  the  constraint  set  for  scenario  k,  AdX  =  bd  represents 
deterministic  constraints  for  all  scenarios,  and  v*  is  the  optimal  objective  function 
value  for  scenario  k  (Dembo  1989,  1991).  However,  many  problems  require  non- 
anticipitivity,  or  here-and-now,  decisions,  i.e.,  choices  that  must  be  made  once- 
and-for-all  prior  to  the  realization  of  0).  In  these  cases  scenario  analysis  or 
scenario  optimization  models  are  inappropriate  (Ermoliev  and  Wets  1988,  Morton 
1994b). 

Ermoliev  and  Wets  also  contrast  the  here-and-now  environment  (which 
requires  an  anticipative  optimization  solution)  to  the  situation  where  the  decision 
maker  observes  co  prior  to  deciding  x.  They  term  this  type  of  model  adaptive 
optimization,  and  in  particular  introduce  the  distribution  problem  as  one  of 
finding  the  distribution  function  of  the  optimal  value  of  adaptive  models.  They 
also  introduce  the  two-stage  recourse  model  "...as  an  attempt  to  incorporate  the 
fundamental  mechanisms  of  anticipation  and  adaptation  within  a  single 
mathematical  model.  In  other  words,  this  model  reflects  a  trade-off  between  long¬ 
term  anticipatory  strategies  and  the  associated  short-term  adaptive  adjustments 
(Ermoliev  and  Wets  1988)". 
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Because  of  their  dual  anticipative/adaptive  nature,  recourse  formulations 
present  a  robust  mechanism  for  analyzing  stochastic  programming  problems. 
Furthermore,  the  inherent  uncertainty  of  these  problems  implies  a  need  for 
estimating  the  distribution  function  of  the  optimal  solution,  the  confidence 
interval  of  the  point  estimates  of  x*(o)),  and  the  objective  function  value. 
Although  such  estimates  are  extremely  difficult  to  find  (Ermoliev  and  Wets 
1988),  this  research  proposes  a  technique  for  such  an  undertaking  for  a  special 
case  of  recourse  problems.  Specifically,  the  research  focuses  on  solving  this  area 
of  SLP  known  as  the  distribution  problem  with  respect  to  stochastic  linear 
programming  problems  with  relatively  complete  recourse.  The  problem  set 
contains  a  specific  and  important  class  of  recourse  problems  that  model  capacity 
expansion.  This  section  reviews  the  appropriate  literature  for  this  class  of 
stochastic  linear  problems  and  provides  the  context  in  which  this  research  is 
conducted.  In  addition  to  Ermoliev  and  Wets  (1988),  general  introductions  and 
overviews  of  the  field  can  be  found  in  Birge  and  Mulvey  (1994),  Dempster 
(1980),  Hansotia  (1980),  Kail  (1976),  Kali  and  Wallace  (1994),  and  Stancu- 
Minasian  and  Wets  (1976). 

2.2.2  Stochastic  Linear  Programming  with  Recourse 

Dantzig  (1955)  first  proposed  what  the  literature  now  calls  stochastic 
linear  programming  with  recourse.  In  the  general  recourse  case,  a  first- stage 
decision  must  be  made  without  knowledge  of  the  values  of  a  subset  of  the 
problem  parameters.  A  second-stage  (recourse)  decision  follows  after  making  the 
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first-stage  decision  and  realizing  the  random  variable(s).  Mathematically,  the 
two-stage  recourse  version  of  (2.1)  can  be  expressed  as 

Min  Z(x)  =  cx  +  E[/i(x,gu,T)] 

s.t.  x  e  X,  X  =  {x  :  Ax  =  b,  x  >  0}  (2.3a) 

where 

/t(x,©,T)  =  Min  dy 

s.t.  Tx  +  Wy  =  ©,  y  >  0;  (2.3b) 

x  and  y  are  the  first-  and  second-stage  decision  variable  vectors,  respectively;  c 
and  d  the  respective  cost  vectors  for  x  and  y;  A  and  b  the  matrix  of  technological 
coefficients  and  right-side  resource  vector,  respectively,  for  the  first-stage 
problem;  and  W  and  ©  the  recourse  matrix  and  random  right-side  resource  vector, 
respectively,  for  the  second-stage  problem.  The  matrix  T  (which  in  certain 
formulations  can  contain  random  components)  represents  the  amount  of  resource 
consumed  in  the  second-stage  based  upon  the  first  stage  decision  x,  and  E  is  the 
expectation  operator.  The  objective  in  (2.3a)  is  to  find  x*  e  X  such  that  Min  Z(x) 
=  Z(x*). 

There  exists  numerous  variations  in  the  literature  on  the  structure  of  the 
recourse  problem  in  (2.3)  due  to  differences  in  the  number  and  type  of  stochastic 
parameters,  distributional  assumptions,  type  of  recourse  available,  and  the 
presence  of  integer  decision  variables.  Walkup  and  Wets  (1967),  and  Wets  (1974) 
provide  one  set  of  classifications  based  on  recourse  type  and  availability,  and  on 
the  location  of  the  random  parameters. 
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1.  Simple  Recourse.  This  type  of  recourse  problem  differs  from  (2.3)  in  that 
while  the  decision  vector  x  is  still  decided  prior  to  the  realization  of  the 
random  variables,  the  formulation  assumes  a  simpler  recourse  W  =  [I,  -I]. 

2.  Fixed  Recourse.  This  term  refers  to  the  formulation  given  in  (2.3),  where 
the  resource  vector  0)  and  matrix  T  may  be  random,  but  W  must  be.  fixed. 

3.  Complete  Recourse.  This  condition  implies  that  the  second-stage  problem 
has  a  feasible  solution  for  any  right-side  value.  A  relaxation  of  this 
condition  —  Relatively  Complete  Recourse  —  indicates  that  a  feasible 
second-stage  solution  exists  for  any  feasible  first-stage  solution  x. 

This  dissertation  restricts  its  research  to  the  class  of  problems  of  the  form  (2.3) 
that  possess  relatively  complete  recourse. 

2.2.3  Chance-Constrained  Stochastic  Programming 

Although  not  directly  addressed  by  this  research,  several  important 
variations  of  SLP  should  be  briefly  noted.  One  such  class  of  models  is  chance- 
constrained  programming  first  proposed  by  Chames  and  Cooper  (1959),  which 
views  the  problem  as  finding  a  solution  that  does  not  exceed  a  given  probability 
of  violating  one  or  more  constraints.  Mathematically,  the  chance-constrained 
version  of  (2.1)  can  be  expressed  as 

Min  cx 

s.t.  Prob{Ax  >b}  >cc 

or  PROBfA'-x^b'}  >a'  x>0  i=l, ...» m  (2.4) 
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where  a  or  a*  are  the  confidence  coefficients  for  the  entire  constraint  set  or  the  i& 
constraint,  respectively,  and  A,  and  b  can  be  all,  or  in  part,  stochastic.  Kail  (1976) 
shows  that  chance-constraint  and  two-stage  recourse  problems  are  not  always 
equivalent  due  to  the  fact  that,  unlike  recourse  programs,  chance-constrained 
programs  are  not,  in  general,  convex  programming  problems.  Furthermore,  he 
shows  that  even  if  the  convexity  condition  exists  for  (2.4)  the  computational 
requirements  remain  formidable.  Further  information  on  chance-constraint 
programming  can  be  found  in  Gartska  (1980),  Kirby  (1970),  Prekopa  (1970),  and 
Vajda  (1980). 

2.2.4  Robust  Optimization 

Mulvey,  Vanderbei,  and  Zenios  (1991)  suggest  another  form  of  stochastic 
programming  that  includes  an  explicit  set  of  error  vectors  in  the  recourse 
formulation;  i.e., 

h(x,S,T)  =  Min  o( y)  +  p+(z+)  +p-(z') 

s.t.  Wy'  +  z+  -  z-  =  S'  -  Tx  fori=  1, . .  ,5 

y>0.  (2.5) 

The  error  vectors  z+  and  z"  in  (2.5)  provide  recourse  in  addition  to  W;  however, 
the  penalty  functions  p+(*)  and  p-(«)  can  be  adjusted  to  insure  that  z+  and  z"  enter 
the  basis  only  if  there  does  not  exist  any  y  such  that  Wy  =  S  -  Tx  and  y  >  0.  In 
effect,  this  extension  —  called  robust  optimization  —  guarantees  the  condition  of 
relatively  complete  recourse.  Their  other  advancement  includes  formulating 
higher  moments  than  the  expected  value  in  the  objective  function  for  discrete 
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versions  through  the  function  <j(»).  By  weighting  the  variance  term  in  (2.5)  they 
construct  an  efficient  frontier  that  describes  a  robust  tradeoff  between  risk  and 
reward  (Mulvey  Vanderbei,  and  Zenios  1991). 

2.3  Solution  Techniques  for  SLP  with  Recourse 
2.3.1  Approximations  and  Bounds 

Much  of  the  research  since  Dantzig  (1955)  and  Beale  (1955) 
independently  proposed  the  recourse  problem  focuses  on  solution  techniques  to 
(2.3).  Morton  provides  a  concise  categorization  of  these  methods  into  three  areas: 
Exact  Solution,  Approximation  and  Bounding,  and  Sampling  (Morton  1994a). 
Morton  metaphorically  compares  these  approaches  to  how  we  solve  integration 
problems:  (1)  first,  we  would  typically  try  to  solve  an  integration  problem 
analytically  to  get  the  exact  solution-,  (2)  next,  for  a  more  difficult  integral  we 
would  attempt  a  numerical  approximation  such  as  Simpson's  Rule;  and,  (3)  finally 
by  Monte  Carlo  sampling  methods  (Morton  1994b).  Although  exact  methods  are 
preferred,  the  literature  indicates  that  such  an  approach  is  rarely  practical. 
Consequently,  approximation  and  bounding  techniques  become  necessary. 

Unfortunately,  for  most  situations  developing  an  approximation  is 
necessary  —  and  difficult.  In  their  introduction  to  approximation  techniques  for 
stochastic  programming  Kail,  Ruszczynski,  and  Frauendorfer  (1988)  note  that  the 
difficulty  of  solving  the  integral  forms  in  (2.1)  directly  has  lead  to  approximation 
methods  for  the  vast  majority  of  cases  where  certainty  equivalents  or  small  finite 
distributions  do  not  exist.  A  review  of  the  literature  shows  that  approximation 
and  bounding  algorithms  dominate  the  research  community's  efforts  to  solve 
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stochastic  linear  programming,  with  particular  focus  on  efficient  implementations 
of  decomposition  algorithms,  parallel  optimization,  sampling  procedures, 
stopping  rules,  and  bounding. 

Regarding  approximation  techniques,  Kail,  Ruszczynski,  and  Frauendorfer 
note  for  the  general  case  of  (2.1)  that  all  such  methods  share  three  fundamental 
issues:  (1)  first,  any  approximation  approach  must  correctly  substitute  a>  with  a 
discrete  representation;  (2)  next,  it  must  develop  some  measure  of  its  accuracy; 
and,  (3)  it  should  provide  a  way  to  surpass  that  accuracy  by  finding  a  better 
approximation  of  the  original  random  vector  (0.  They  point  out  that 
accomplishing  the  first  task  requires  sampling  the  probability  space  to  find  an 
accurate  discretized  version  of  (0,  a  task  made  difficult  for  several  reasons.  First, 
there  exists  the  fundamental  problem  of  not  knowing  beforehand  how  much  is 
enough;  e.g.,  how  detailed  a  partition  of  the  probability  space  is  necessary. 
Second,  the  degree  of  non-linearity  influences  the  degree  of  partitioning  necessary 
for  accurate  approximation.  Finally,  the  properties  of fo(x)  vary  with  x,  hence,  the 
sampling  itself  depends  on  x  (Kali,  Ruszczynski,  and  Frauendorfer  1988). 

Much  of  the  recent  literature  on  bounding  algorithms  for  the  recourse 
problem  devotes  its  efforts  to  improving  the  error  estimate  associated  with  the 
upper  bound.  This  effort  stems  from  the  fact  that  while  calculating  the  lower 
bound  under  Jensen's  inequality  generally  requires  only  a  small  number  of 
evaluations  for  convergence,  the  upper  bound  calculations  require  evaluating 
/i(x,0),T)  at  every  extreme  point  of  the  probability  space,  which  for  an  m-random 
components  in  ©  and  T  requires  solving  /i(x,co,T)  2m  times  (Birge  and  Wets 
1989).  Additional  problems  stem  from  assumptions  of  independence;  however, 
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both  the  empirical  evidence  and  theoretical  results  imply  that  Jensen's  inequality 
provides  a  better  estimate  of  the  optimal  value  of  the  recourse  problem  (Gassmann 
and  Ziemba  1986).  Consequently,  the  literature  offer  several  extensions  of  these 
basic  approximations. 

Birge  (1985)  proposes  aggregating  rows  and  columns  of  the  original 
recourse  problem  to  reduce  the  computational  complexity.  Frauendorfer  (1988) 
supplies  a  straightforward  extension  of  the  Edmundson-Madansky  upper  bound 
inequality  to  the  case  of  dependent  distributions  among  the  elements  of  to.  Birge 
and  Wets  (1986)  provide  an  extensive  catalogue  of  approximation  methods  based 
upon  linearization  of  an  objective  function  (called  original,  subgradients,  rays, 
and  pairs )  and  one  of  several  techniques  for  obtaining  discrete  probability 
measures  (called  conditional  expectations,  extreme  points,  extremal  probability 
measures,  and  majorizing  probability  measures),  together  with  guidelines  for 
applying  them  to  fixed  recourse  problems  in  conjunction  with  optimization 
solution  methods.  Gassmann  and  Ziemba  (1986)  suggest  using  linear 
programming  on  a  partition  of  the  probability  space  that  provides  a  tighter  upper 
bound,  a  technique  extended  by  Edirisinghe  and  Ziemba  (1992)  to  include 
variable  and  constraint  aggregation,  and  multi-stage  recourse  applications.  Birge 
and  Wallace  (1988),  using  the  ray  approximation  procedure  from  Birge  and  Wets 
(1986),  develop  upper-bound  approximations  whose  computational  requirements 
are  polynomial  in  the  dimensionality  of  (0  (also  see  Birge  and  Wets  1989). 
Finally,  Birge  and  Dula  (1991)  propose  an  extension  of  this  sublinear 
approximation  to  include  non-linear  recourse  problems  with  first-  and  second- 
moment  information  on  the  elements  of  to. 
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A  related  area  of  interest  regarding  the  upper  and  lower  bounds  of  the 
recourse  problem  deals  with  the  value  and  state  of  information  regarding  the 
uncertain  parameters.  Specifically,  the  literature  addresses  two  fundamental 
questions  in  this  area:  (1)  the  value  of  additional  information  regarding  the 
uncertainty  in  co;  and,  (2)  the  degree  of  error  between  a  deterministic 
approximation  and  its  more  accurate  stochastic  counterpart.  First  addressed  by 
Madansky  (1960)  and  Avriel  and  Williams  (1970),  Birge  (1982)  offers  a  summary 
of  their  work  on  expected  value  of  perfect  information  (EVPI)  and  presents  the 
value  of  the  stochastic  solution  (VSS).  Rewriting  (2.3),  Birge  reviews  previous 
results  showing  that  for  the  problem 

<Kx,g>,T)  =  cx  +  MiN[dy  I  Tx  +  Wy  =  to,  y  >  0] 

s.t  xe  X,X=  {x: Ax  =  b,x>0},  (2.6) 

where  the  expected  value  for  the  wait-and-see  solution  (WS)  is 

WS  =  E[Min  4>(x,co,T)],  xgX,  (2.7) 

the  expected  value  for  the  recourse  problem  (RP)  is 

RP  =  Min  [E  <|)(x,co,T)],  x  e  X.  (2.8) 

The  expected  value  approximation  (EV)  is  defined  as 

EV  =  Min  <Kx,E[co],E[T]),  x  €  X, 

and  where  x  is  the  optimal  solution  to  (2.9),  the  expected  result  of  using 


(2.9) 

xis 
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EEV  =  E[<t>(x,(0,T)],  xe  X. 


(2.10) 


Consequently,  the  following  bounds 

EEV  >  RP  >  WS  >  EV  (2.1 1) 

hold  due  to  the  convexity  of  <(>(x,©,T)  and  Jensen’s  inequality.  From  (2.1 1)  Birge 
shows  that 

EVPI  =  RP  -  WS 

EEV  -  EV  >  EVPI  >  0  (2. 12) 

and  from  Avriel  and  Williams  (1970)  repeats  the  suggestion  that  (2.12)  provides  a 
bound  on  the  value  of  more  information  regarding  ©  and  T.  Birge  then  suggests 
another  measure  —  the  value  of  the  stochastic  solution  (VSS)  —  defined  as 

VSS  =  EEV  -  RP 

EEV  -  EV  >  VSS  >  0  (2.13) 

to  establish  the  worth  and  value  bounds  for  solving  more  complicated  recourse 
models  (Birge  1982).  Additional  research  on  information  costs  includes  ©  and  T 
with  discrete  distributions  (Baron  1971),  lower  bounds  for  EVPI  (Morris  and 
Thompson  1980),  and  bounds  for  linear  and  concave  preference  functions 
(Hausch  and  Ziemba  1983). 
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2.3.2  Exact  Solution  Methods 


According  to  Morton's  classification,  exact  solution  methods  "...include 
simplex-based  algorithms  that  exploit  special  structure  of  bases... decomposition 
or  L-shaped  schemes... interior  point  methods... and  the  Progressive  Hedging 
algorithm... (Morton  94a)."  To  understand  how  and  why  these  solution  methods 
can  be  applied  to  (2.3),  its  general  characteristics  (such  as  convexity)  need  to  be 
determined.  Wets  (1966a)  accomplished  this  characterization  by  showing  that 
(2.3a)  is  convex  and  continuous  for  that  subset  of  solutions  xe  X  when  (2.3a)  is 
feasible  for  all  realizations  of  ©  and  T.  Furthermore,  he  shows  that  the  expected 
value  of  7t(©  -  Tx)  can  be  used  to  construct  a  supporting  hyperplane  to  (2.3a) 
(Wets  1966a).  These  important  results  theoretically  clear  the  way  for  solving 
(2.3a)  with  variants  of  the  major  decomposition  algorithms  and  show  that  a  local 
minima  for  (2.3a)  is  also  a  global  minima.  (In  a  related  paper  Wets  (1966b)  also 
shows  that  the  solution  set  to  (2.3a)  is  both  convex  and  polyhedral.) 

Dantzig  and  Madansky  (1960)  first  proposed  applying  the  Dantzig-Wolfe 
decomposition  algorithm  to  the  dual  of  a  special  case  of  (2.3) 


cx 

+  Pidiyi 

+  p2d2y2+ 

+  Psdsys 

Ax 

=  b 

Tjx 

+  Wiyi 

=  ©i 

T2x 

+  W2y2 

=  ©2 

T5x 

+  W5ys 

=  CO s 

(2.14) 


x  >  0;  y,-  >  0,  /  =  1, ...  ,S 


where  there  exists  a  finite  number  of  co(  (i  =  1,  2, . . . ,  s)  with  known  probability 
pf.  Wets  (1966a)  proposes  a  modification  to  the  dual  of  (2.14)  whereby  the 
normal  dual  variables  Jt;  are  replaced  with  nt  =  (l/p,)-7t(.  The  subsequent  dual  set 
of  inequalities  corresponding  to  the  column  vectors  associated  with  x  in  the 
primal  then  form  the  master  problem  in  the  dual,  while  the  set  of  constraints 
corresponding  to  decision  variable  yt-  in  (2.14)  create  subproblem  i  in  the  dual. 
Wets  also  discusses  another  version  of  (2.14)  whereby  subtracting  the  k&  row  T*x 
+  W*y ic  =  (Ok  from  the  K+l^  row  Tk+1x  +  Wk+1yk+i  =  <ak+1  generates  a  staircase 
system  with  the  same  constraints  as  in  the  recourse  section  (2.14)  (except  for  Tx 
+  W!yi  =  m1),  thus  providing  a  simpler  form  for  computation  (Wets  1966a). 
(Also  see  Wets  (1974,1988)  for  a  more  recent  summary  of  his  results.) 

In  1969,  Van  Slyke  and  Wets  developed  the  L-shaped  algorithm  based  on 
the  immediate  result  of  Wets  (1966b)  proof  that  if  "...the  set  of  feasible  decisions, 
represented  by  an  n- vector  x,  is  a  convex  polyhedral  subset  of  Rn..  .at  most  a  finite 
number  of  linear  constraints  must  be  added  to  the  problem  (2.3a)  to  determine  the 
set  of  feasible  decisions  (Van  Slyke  and  Wets  1969)."  For  such  a  finite  case  their 
algorithm  iteratively  adds  two  types  of  constraints  to  (2.3a)  —  both  in  terms  of  x 
—  that:  (1)  reduce  the  region  of  (2.3a)  as  necessary  to  guarantee  feasibility  for 
(2.3b)  (feasibility  cuts);  and,  (2)  produce  an  optimal  solution  to  (2.3a)  through  the 
dual  variables  found  in  solving  the  recourse  problem  (2.3b)  (optimality  cuts). 
Unfortunately,  Van  Slyke  and  Wets  point  out  that  for  the  continuous  case  of  © 
(including  T),  developing  the  second  set  of  constraints  requires  knowing  the 
entire  description  of  its  probability  space,  and  can  easily  lead  to  an  infinite 
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number  of  simplex  multiplier-based  constraints.  Furthermore,  they  note  that  even 
problems  with  finite  distributions  of  to  may  have  a  tremendously  large  set  of 
values,  thus  posing  practical  computational  problems.  Finally,  they  suggest 
investigating  approximation  schemes,  although  such  methods  may  eliminate  the 
optimal  solution  through  sampling  error  (Van  Slyke  and  Wets  1969).  (Wets 
(1972)  also  provides  characterization  theorems  and  algorithms  to  test  whether  a 
recourse  problem  is  feasible,  bounded,  and  solvable  prior  to  solving  it.) 

Garstka  and  Rutenberg  (1973)  present  one  of  the  first  enhancements  to  the 
L-shaped  algorithm  by  developing  a  more  efficient  method  of  calculating  the  dual 
variables  of  the  recourse  problem  associated  with  the  optimality  cuts.  Taking 
advantage  of  the  lattice  structure  of  a  finite  distribution  of  co,  they  perform  a 
parametric  ranging  of  all  combinations  of  the  elements  (lattice  points)  in  co 
regarding  their  feasibility  for  a  given  optimal  basis.  They  justify  this  approach  by 
noting  that  the  information  provides  a  probability  estimate  for  each  realization  of 
co  that  in  turn  helps  calculate  the  dual  variables  in  the  L-shaped  algorithm. 
Garstka  and  Rutenberg  then  incorporate  this  idea  in  a  basis  generation  procedure 
that  removes  a  vector  from  the  current  optimal  basis  and  replaces  it  under  the  dual 
simplex  procedure.  At  each  optimal  basis  their  algorithm  systematically  classifies 
the  lattice  points  according  to  their  feasibility  using  the  above  parametric  ranging 
procedure.  Garstka  and  Rutenberg  show  their  sifting  algorithm  to  be  more 
efficient  than  earlier  techniques  that  sequentially  try  all  lattice  points  against  an 
optimal  basis,  then  find  another  optimal  basis  from  which  to  try  those  lattice 
points  that  were  infeasible  under  the  previous  basis,  and  so  on  until  all  realizations 
of  ©  are  allied  with  an  optimal  basis  (Garstka  and  Rutenberg  1973).  While  this 


method  can  provide  faster  execution  of  the  L-shaped  algorithm,  it  can  also 
iteratively  supply  finer  discrete  approximations  for  continuous  cases  of  g>  (Wets 
1983).  Also  Wets  (1988)  points  out  that  bunching  procedures  may  be  more 
appropriate  for  cases  where  ©  cannot  be  represented  in  a  lattice  structure,  contains 
dependent  random  variables,  or  derives  from  an  approximation  design. 

Both  approaches  share  the  assumption  that  a  relatively  small  number  of 
optimal  bases  exist  for  the  recourse  problem  (2.3b)  for  all  realizations  of  ©  -  Tx. 
Garstka  and  Rutenberg  (1973)  claim  that  even  with  a  large  number  of  lattice 
points  "...the  [recourse]  subproblems  are  very  similar  indeed,  differing  only  by 
some  minor  change  to  an  element  in  p'  [the  random  right-side  vector].  It  seems 
likely  that  a  great  many  of  the.. .optimal  bases. ..will  be  the  same.. .so  there  is  the 
combinatorial  problem  of  spotting  them  in  some  systematic  manner. . . "  Similarly, 
Wets  (1983)  notes  that  "...because  of  the  nature  of  the  problem  at  hand  it  is 
reasonable  to  expect  that  only  a  small  number  of  bases  in  W  (W  in  (2.3b))  will 
suffice  to  bunch  all  the  realizations."  Furthermore,  in  related  areas  of  stochastic 
programming  such  as  the  distribution  problem,  some  numerical  algorithms 
implicitly  make  the  same  assumption  by  parametrically  decomposing  the  sample 
space  of  ©  into  decision  regions  (optimal  bases)  for  later  use  with  updated 
probability  distributions  (Bereanu  1980).  One  option  this  dissertation  proposes 
for  solving  and  characterizing  the  recourse  function  incorporates  this  assumption 
of  a  few  optimal  bases  in  the  recourse  problem.  Chapter  3  explains  in  detail  how 
the  proposed  Monte  Carlo  simulation  framework  implements  this  approach  for 
solving  recourse  problems. 
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Kail  (1979)  proposes  a  basis  factorization  technique  (see  Strazicky  1974) 
for  exact  solution  methods  by  taking  advantage  of  the  block  structure  of  the  dual 
of  (2.14).  He  suggests  representing  the  dual  basis  B  of  any  feasible  solution  to 
(2.14)  in  the  form 


B  = 


T  Y  \ 

,L  Z  , 


(2.15) 


where  submatrix  T  is  regular  and  invertable.  Kail  shows  how  all  iterations  of  the 
simplex  can  maintain  the  form  of  (2.15)  with  submatrix  T;  and,  through  a 
reformulation  of  the  dual  of  (2.14)  constructs  a  block-diagonal  form  for  T  as  well. 
Where  A  is  mxn,  W  is  nxv,  r  the  number  of  scenarios  or  realizations  of  ©  in 
(2.14),  Kail  (1979)  shows  the  number  of  operations  per  simplex  iteration  on  (2.15) 
for  the  factorization  method  to  be  on  order  0(f),  where  standard  pivots  on  (2.15) 
are  on  order  0([r(\  -  u)  +  n]2).  However,  Birge  (1988)  shows  that  this  dual  basis 
factorization  requires  the  same  computational  effort  as  the  L-shaped  method. 

Birge  and  Louveaux  (1988)  propose  an  extension  of  the  L-shaped  method 
that  provides  (under  certain  conditions)  faster  convergence  to  an  optimal  solution 
than  Van  Slyke  and  Wets'  original  algorithm.  Their  algorithm  recognizes  that  in 
inner  linearization  (column  formation)  decomposition  algorithms  the  rate  of 
optimal  solution  convergence  may  increase  with  multiple  —  as  opposed  to  single 
—  column  generation  proposed  by  Birge  (1985).  Although  the  L-shaped  method 
performs  an  outer  linearization  (row  generation)  of  (2.3a)  through  the  dual 
variables  of  the  recourse  problem,  Birge  and  Louveaux  hypothesize  that  the  same 
phenomenon  can  occur  in  such  outer  linearization  methods.  Consequently,  their 
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extension  generates  multiple  optimality  cuts  at  the  same  iterative  point  where  Van 
Slyke  and  Wets’  algorithm  generates  one.  Furthermore,  Birge  and  Louveaux 
establish  an  upper  bound  on  the  number  of  iterations  for  the  two  versions  based 
on  the  maximum  number  of  finite  convex  sets  of  the  recourse  polyhedron  (b),  the 
number  of  constraints  in  the  recourse  problem  ( m2),  and  the  number  of 
realizations  of  ©  ( K ).  These  bounds  —  1+  K[bm2  -1]  for  their  multi-cut  method 
versus  (1  +  K[b  -  l])m2  for  the  L-shaped  algorithm  —  shows  how  the  multi-cut 
method  could  have  an  advantage  for  large  m2  (Birge  and  Louveaux  1988). 

According  to  Helgason  and  Wallace  (1991)  the  L-shaped  method 
dominated  most  computational  research  in  stochastic  programming  until  Wets 
(1989)  and  Rockafeller  and  Wets  (1991)  proposed  another  decomposition  method 
called  scenario  aggregation.  As  explained  by  Wets  (1989),  scenario  aggregation 
differs  from  the  earlier  decomposition  algorithms  by  its  underlying  assumption 
that  the  random  elements  of  the  problem  cannot  be  accurately  described  by  a 
probability  distribution.  Instead  of  solving  the  problem  through  discrete 
approximations  based  on  distributional  assumptions.  Wets  describes  this  lack  of 
information  leading  to  scenario  analysis',  i.e.,  characterizing  the  randomness 
through  a  relatively  few  scenarios.  Specifically,  Wets  proposes  an  algorithm  that 
finds  a  solution  x*  to  the  problem 

Min  X  p/(x,sN) 

s.t.  xerVyCs  (2.16) 

where  N  =  {s1, ...  ,sL}  represents  the  set  of  scenarios  s'  and  ps  is  the  probability 
weight,  Cs  the  set  of  solutions,  and /(x,  s')  is  the  objective  function  for  scenario  s'. 
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respectively.  Letting  xvs  represent  the  optimal  solution  to  the  individual  scenario 
Min  f'Cx.s)  s.t.  x  e  Cs  at  iteration  v,  Wets  progressively  updates  xvs  with  an 
average  solution  kv  =  hpsxvs  and 

f(x,s)  =f(x,s)  +  wv'](s)x  +  |plx  -  xv‘7l2  (2.17) 

where 

wv(s)  =  wv-!(s)+  plxw  -  xvl.  (2.18) 

Defining  an  implementable  solution  as  one  where  k  is  scenario  independent  and 
an  admissible  solution  to  be  one  where  &  is  feasible  for  all  Cs,  he  shows  that  an 
optimal  solution  to  (2.16)  meets  both  conditions  when  xvs  =  kv  for  all  s  e  N.  Wets 
also  argues  that  unlike  a  simple  averaging  of  the  optimal  solutions  xs  to  the 
individual  scenario  problems  f(x,s),  a  solution  to  (2.16)  allows  for  the  costs  that 
occur  for  choosing  a  given  x  and  realizing  scenario  s  (Wets  1989).  Although  not 
directly  related  to  this  dissertation's  area  of  research  (two-stage  problem  with 
recourse),  it  should  be  noted  that  Wets  (1989)  and  Rockafeller  and  Wets  (1991) 
extend  the  scenario  aggregation  policy  to  multiple  periods  by  modifying  the 
method  to  calculate  a  solution  set  conditioned  upon  known  or  observable 
information  about  the  problem  in  each  period.  This  scenario  aggregation 
technique  that  produces  an  implementable  and  admissible  solution  under  such 
non-anticipitivity  conditions  for  multi-period  problems  is  known  as  the 
progressive  hedging  algorithm  (Wets  1989,  Rockafeller  and  Wets  1991).  Recent 
applications  of  the  progressive  hedging  algorithm  include  Lagrangian 
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approximations  of  the  individual  scenarios  (Helgason  and  Wallace  1991)  and 
stochastic  network  programming  (Mulvey  and  Vladimirou  1991, 1992). 

2.3.3  Sampling  Methods 

The  previous  sections  discussed  solution  techniques  for  (1)  cases  where 
realizations  of  the  random  vector  0)  are  exact,  known,  and  of  manageable  size; 
and,  (2)  approximation  methods  such  as  aggregation  or  partitioning  for 
distributions  of  ©  that  are  either  continuous,  possess  large  discrete  realizations,  or 
are  only  partially  known.  The  literature  offers  another  approach  using  sampling 
techniques  under  two  basic  contexts  —  decomposition  and  stochastic 
quasigradient  (SQG)  algorithms.  In  general,  both  aggregation  and  stochastic 
decomposition  involve  algorithms  that  work  with  only  a  small  fraction  of  the 
overall  sample  space,  thus  implying  a  need  to  determine  solution  quality  (Higle 
and  Sen  1993).  Although  the  SQG  approach  primarily  employs  gradient  search 
techniques,  it  too  only  has  a  small  exposure  to  the  sample  space  that  causes 
problems  in  estimating  the  solution  quality.  The  literature  on  SQG  methods  is 
covered  in  Section  2.3.4. 

Higle  and  Sen  (1991b)  propose  a  variant  of  the  decomposition  approach 
for  two-stage  recourse  problems  based  on  sampling  the  random  vector  ©  of 
subproblem  (2.3b).  For  a  given  feasible  solution  xk  to  (2.3a),  Higle  and  Sen 
observe  that  the  dual  of  (2.3b)  is 

h(xk,oa)  =  Max  #(©-Tx*) 

s.t.  7t*W  <  d  %k  unrestricted  (2.19) 
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and  where  they  define  V  as  the  set  of  all  vertices  for  the  dual,  V*  as  the  set  of  dual 
vertices  {n^K2, . . .  ,#},  and  (O'  an  independent  sample  of  co  for  t  =  1, . . . ,  k,  then 


Max{7T((o'  -  Tx*)  1 7t  e  V*}  <  Max  {71(G)'  -  Txk)  \n  e  V}  (2.20) 
and 

7t*,(G)'  -  Tx*)  <  h(xk, oy).  (2.21) 

Higle  and  Sen  also  define  a  piecewise  linear  approximation  of  (2.3b)  after  k 
iterations  as 

fi^x)  =  Max  {a kt  +  (P*,  +  c)x  1 1  =  1, . .  .,k]  (2.22) 

which  in  turn  forms  the  master  program 

Min/*(x),  s.t.  Ax  =  b,  x  >  0  (2.23) 

Based  on  the  results  of  (2.20-21),  their  algorithm  at  the  k&  step  receives  an 
optimal  solution  xk  from  (2.23)  (based  upon  k- 1  previous  cuts)  and  generates  a 
new  cut  k  of  the  form 

1  k 

akk  +  (P kk  +  c)x  =  cx  +  t  X  nkt(&  -  Tx)  (2.24) 

K  t=i 

for  inclusion  in  (2.22)  at  the  k+ 1  iteration.  Additionally,  they  update  the  previous 
cuts  using  the  formula 

o-ki  =  p*/  =^-j^$k'1t,  t=l,..,k-l  (2.25) 
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in  order  to  provide  all  generated  cuts  with  current  sampling  information.  Higle 
and  Sen  (1991b)  essentially  build  an  approximate  outer  linearization  of  (2.3a) 
using  cuts  generated  by  (2.24)  and  independent  samplings  of  oo  and  T ;  in  this 
fashion,  they  avoid  any  requirement  for  finite  realizations  of  and  T  (or 
equivalent  discrete  approximations)  and  any  distributional  assumptions  on  their 
random  components. 

Higle  and  Sen  (1991b)  also  address  the  convergence  issue  their  algorithm 
poses  due  to  the  randomized  nature  of  the  cuts  generated  in  (2.24)  and  (2.25)  by 
proposing  a  modification  to  their  basic  algorithm  using  an  incumbent  solution  xk~] 
at  the  kfh  iteration.  Designating  this  algorithm  stochastic  decomposition,  they  use 
the  incumbent  solutions  xk']  and  fc^1  to  update  all  previous  cuts  (2.25)  with  the 
current  realization  (0f,  then  substitute  the  incumbent  solution  xk'J  with  xk  if  the 
condition 


M*k)  -  *(x*'7)  <  r-{fk_i(xk)  -  fk-i( x*-7)},  (2.26) 

where  ris  fixed  on  the  interval  [0,1],  is  met.  Higle  and  Sen  also  show  that  at  least 
one  of  the  test  statistics 

i  k  1  mk 

or  f  =  (2.27) 

K  i=l  rnk  i=l 

where  mk  represents  the  number  of  incumbent  solutions  detected  by  iteration  k, 
converges  to  the  optimal  solution  value,  thus  suggesting  a  stopping  criteria  of  the 
form 
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respectively.  They  also  suggest  comparing  the  cardinality  of  the  sets  of  vertices 
Vk  and  Vk+I  to  the  dual  (2.19)  as  a  method  to  prevent  premature  termination 
(Higle  and  Sen  1991b). 

In  a  separate  article,  Higle  and  Sen  (1991a)  offer  additional  termination 
criteria  to  the  stochastic  decomposition  algorithm  based  upon  the  empirical  results 
of  their  algorithm  arriving  at  an  optimal  solution  prior  to  solution  stability  as 
represented  in  (2.28).  They  propose  alternative  termination  methods  by  either 
resampling  the  observations  on  CD  (using  Lagrangian  duality  and  Kuhn-Tucker 
conditions)  or  resampling  the  dual  vertices  of  (2.19)  (Kuhn-Tucker  alone).  Their 
results  from  an  example  power  expansion  program  found  that  the  stopping  point 
for  the  three  proposed  statistical  verifications  of  optimality  occurred  at  fewer 
iterations  than  the  original  objective  function  stability  measure  (2.28).  Finally, 
Sen,  Mai,  and  Higle  (1994)  provide  a  summary  of  the  stochastic  decomposition 
algorithm  within  the  framework  of  a  randomized  versions  of  Kelly's  (1960) 
cutting  plane  methods  and  Benders'  (1962)  decomposition  of  two-stage  linear 
programs. 

Dantzig  and  Glynn  (1990)  propose  a  method  that  combines  Benders' 
decomposition,  Monte  Carlo  sampling,  and  parallel  processing  to  solve  multi¬ 
period  problems.  Their  idea  places  the  master  problem  under  the  control  of  a 
single  processor  that  iteratively  provides  updated  solutions  x  to  several  parallel 
computers  responsible  for  solving  the  dual  subproblems  for  sample  realizations  of 


to.  The  subprocessors  in  turn  update  the  master  program  with  cuts  used  to 
develop  a  new  estimate  of  x.  They  terminate  the  algorithm  when  the  difference 
between  the  upper  bound  for  Min  Z(x)  and  the  lower  bound  (cuts  from  the 
subproblems)  reaches  a  pre-specified  interval.  For  continuous  or  large  finite  cases 
of  oo,  Dantzig  and  Glynn  suggest  a  Monte  Carlo  sampling  procedure  for  the 
subproblem;  and,  to  improve  both  the  convergence  (through  variance  reduction) 
and  robustness  of  the  algorithm  they  incorporate  importance  sampling  of  the 
scenarios  (Dantzig  and  Glynn  1990.  Also  see  Dantzig  and  Infanger  1991).  Sen, 
Mai,  and  Higle's  (1994)  comparison  of  Dantzig  and  Glynn's  approach  to 
stochastic  decomposition  notes  that  the  former  suffers  from:  (1)  using  a  fixed, 
independent  samplings  that  detract  from  calculating  the  error  bounds;  and  (2) 
solving  one  subproblem  for  each  sample  CD  at  each  iteration  versus  no  more  than  2 
subproblems  per  iteration  for  the  stochastic  decomposition  method. 

Berger,  Mulvey,  and  Ruszczynski  (1993)  propose  another  method  for 
solving  scenario-based  models  based  upon  the  idea  of  dualizing  the  non- 
anticipitivity  constraints  (i.e.,  the  first-stage  decision  variables  x  in  (2.14))  to 
produce  separable  programming  problems  for  each  scenario.  For  example,  (2.14) 
can  be  reformulated  to  read 


Min  cxi+pidiyi  +...+  cxyf  p^d^ys 

s.t.  Axi  =  b 

Tixi+Wiyi  =  G>i 


Ax5  =  b 

T5xs+W5y5  =  C05  (2.29a) 
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Xl  -X2 


0 


X5.1-X5  =  0 


Xj  >  0;  y«  >  0. 


(2.29b) 


Redefining  the  notation  of  (2.29)  to  where 


A  0  ^ 
T  Wf  ; 


(2.30) 


they  let  N,-  be  the  0  period  where  element  (j,k)  of  N  equals  1  if  and  only  if  j  =  k 
and  the  j&  component  of  N  shares  the  same  history  through  period  i+l.  Under 
these  conditions,  a  general  multi-period  recourse  formulation  (where  the  two- 
stage  recourse  problem  is  a  special  case)  can  be  written  as 


s 


Min  X  CjX,- 

(2.31a) 

i=i 

s.t.  A  jX,-  =  b,- 

(2.31b) 

NiXj  -  N,x,+i  =  0 

i  =  1 , . . . ,  n  -  1 

(2.31c) 

X 

IV 

o 

i=  1, . . . ,  n. 

(2.3  Id) 

Berger,  Mulvey,  and  Ruszczynski  then  dualize  constraint  (2.31b)  by  dropping 
(2.31c)  and  replacing  (2.31a)  with 

s  s  s 

X  CjXj  +  X  rti(N/X;  -  N  ;Xl+;)  +\  r  X  UN  /Xj  -  NjXl+i)ll2.  (2.32) 

i=l  i=l  1  i=l 
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They  propose  an  approximation  method  for  solving  the  non-separable  quadratic 
form  of  (2.32)  using  a  nonlinear  interior  point  method  whereby  the  approximate 
solution  ft  and  the  multipliers  n ;  are  updated  every  2-4  and  100-150  iterations, 
respectively  (Berger,  Mulvey,  and  Ruszczynski  1993).  Berger  and  Mulvey  (1993) 
propose  an  improvement  of  this  algorithm  with  a  restart  strategy  that  addresses 
the  instability  of  interior  point  codes  after  updating  ft. 

2.3.4  Subgradient  Methods 

Sampling  methods  for  recourse  problems  using  probability  distributions  to 
model  the  random  parameters  are  found  in  the  literature  in  two  basic  categories: 
(1)  the  sampling-based  algorithms  applied  in  a  decomposition  context  (such  as 
Higle  and  Sen's  stochastic  decomposition  algorithm)  just  reviewed;  and  (2) 
stochastic  quasigradient  (SQG)  algorithms  (Morton  1994a).  The  SQG  method 
attempts  to  find  the  solution  to  the  recourse  problem  by  extending  the  classic 
steepest  descent  (or  gradient  search)  method  found  in  non-linear  programming  to 
the  stochastic  programming  environment.  As  explained  by  Luenberger  (1989),  in 
a  deterministic  setting  we  wish  to  find 


xk+i  =  xk  -  akVf(xk) 

where  V/(x*)  is  the  gradient  of  the  function  /(x)  defined  as 


3/(x)  3/(x) 
9xi  3x2 


3/(x) 
dxM  . 


(2.33) 


(2.34) 
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and  ak  is  a  scalar  that  minimizes  f(xk  -  a kVf(xkj).  IfxeX  constrains  (2.33),  then 
a  projection  gradient  method  determines  the  descent  direction  by  projecting  the 
gradient  onto  the  working  surface  to  maintain  feasibility  (Luenberger  1989).  As 
Law  and  Kelton  (1991)  point  out,  (2.34)  obviously  cannot  be  directly  applied  in 
the  stochastic  environment  due  to  random  variation  of  /(x);  hence,  a  gradient 
estimation  technique  (such  as  replication  or  perturbation  analysis )  must  be 
employed.  SQG  algorithms  extend  these  stochastic  approximation  algorithms  to 
stochastic  programming  problems  (Ermoliev  1983). 

Defining  F(x)  =  E(o[/(x,G))]  as  the  expected  value  of  the  objective  function, 
Gaivoronski  (1988)  provides  a  general  extension  of  (2.33)  for  the  stochastic 
programming  problem  as 

xk+1  =  flxfx*  -  a¥],  k  =  0, 1, . . . 

E[v*  I  x°,  x1, . . .  ,x*]  =  Fx(x*)  +  bk  (2.35) 

where  nx  is  the  projection  operator,  xk  is  the  incumbent  solution  approximation, 
ak  is  the  step  size,  and  vk  (as  a  statistical  estimate  of  the  subgradient  of  F(x))  is  the 
stochastic  quasigradient  of  F(x).  In  general,  he  notes  that  the  quasigradient  vk  can 
be  estimated  as 

v*  =  ilY/(x*,  ©0  (2.36) 

iV  i=l 

using  Monte  Carlo  sampling  (with  asymptotic  error  1/a/TV),  or  where  distributional 
information  on  to  is  available  he  suggests  potential  asymptotic  error  rates 
approaching  \og{N)!N.  He  also  observes  that  SQG  methods  do  not  exhibit 
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monotonicity  due  to  the  random  nature  of  (2.36),  thus  contributing  to  problems  in 
determining  optimality  (i.e.,  Ilx*+i  -  x  *11  <  e),  calculating  step  size  ak,  and 
estimating  step  direction  v*.  Finally,  Gaivoronski  reviews  several  alternatives  to 
(2.36)  such  as  random  search  analogs,  finite  difference  approximations 


v*=  I 

i=i 


or  central  finite  differences 


<k_  K*k+&ejMiYf{xk+<&i) 


8* 


(2.37) 


k_  Hf(xk+&kej,Gti)-f[xk-&kei,G)ki) 


vk  =  I 

i=i 


28* 


(2.38) 


where  e,-  is  the  unit  vector  and  8*  is  a  scalar;  or,  in  the  case  where  F(x)  is  not 
differentiable  he  suggests  objective  function  smoothing,  or  averaging  using 


y/k+l  =  1 


k 

M  i-k-M+1 


(2.39) 


where  M  is  memory  size  (Gaivoronski  1988).  In  the  case  of  the  recourse  function 
JErmoliev  (1983, 1988)  shows  that  for  a  single  sample  0)5 

\k  =  c  +  p(x^,o>s)T  (2.40) 

where  |x(x*,£5)  are  the  dual  variables  for  the  recourse  problem. 

Recent  research  addresses  the  major  drawbacks  of  SQG  techniques;  i.e., 
slow  convergence,  oscillation  in  the  neighborhood  of  the  optimal  solution 
(stopping  times),  projection  on  X,  and  selecting  the  appropriate  stepsize. 
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Ruszczynski  and  Syski  (1986)  propose  an  auxiliary  filter  that  provides  aggregate 
stochastic  subgradients  at  each  iteration.  Pflug  (1988)  reviews  both  of  these 
problems  in  the  context  of  deterministic,  adaptive,  ratio-of-progress,  oscillation, 
and  inner  product  tests,  as  well  as  providing  comparison  and  implementation 
comments.  Urasiev  (1988)  discusses  an  adaptive  procedure  using  quasigradient 
directional  information  to  calculate  subsequent  stepsizes,  while  Marti  (1988) 
proposes  a  semi-stochastic  approximation  that  under  certain  cases  can  partially 
restore  monotonicity  to  the  objective  function.  Rockafeller  and  Wets  (1988)  give 
a  method  for  gradient  projection  on  X  that,  under  certain  conditions,  does  not 
require  penalization  or  primal-dual  workarounds.  In  addition  to  Gaivoronski 
(1988),  Ermoliev  (1988),  and  Ermoliev  and  Nurminisky  (1980)  provide  general 
introductions  to  SQG  methods. 

2.4  Simulation  and  SLP  with  Recourse 

Referring  to  Figure  1.1,  the  previous  sections  of  this  chapter  summarize  an 
extensive  amount  of  research  in  the  categories  of  Search  Techniques  and 
Optimization  Algorithms.  In  all  cases  the  research  objective  attempts  to  provide  a 
better  method  for  finding  the  optimal  solution,  and  in  many  cases  does  so  by 
improving  the  statistical  estimation  of  the  stochastic  effects  within  a  mathematical 
programming  framework.  However,  an  extensive  review  of  the  literature  found 
no  attempts  to  model  stochastic  two-stage  linear  programming  programs  with 
fixed  and  complete  recourse  from  a  simulation  perspective.  Consequently,  very 
little  has  been  accomplished  in  investigating  the  remaining  four  categories  of 
Variance  Reduction,  Experimental  Design,  Response  Surface  Analysis,  and 
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Distributional  Analysis  since  these  topics  naturally  arise  in  a  simulation 
environment.  (The  efforts  by  Dantzig  and  Glynn  (1990),  Danztig  and  Infanger 
(1991),  and  Infanger  (1994)  in  applying  Importance  Sampling  to  stochastic 
programming  provide  one  exception  in  the  open  literature  for  using  variance 
reduction  methods.)  This  absence  of  research  thus  provides  this  dissertation  with 
its  principal  focus  and  contributions,  and  constitutes  the  subject  matter  of 
Chapters  4  and  Chapter  5.  Therefore,  this  section  will  offer  only  a  general 
overview  of  simulation  optimization,  and  will  defer  the  application  of  these 
techniques  in  the  SLP  context  to  the  following  chapters. 

Azadivar  (1992)  provides  a  brief  overview  of  the  four  major  approaches  to 
using  simulation  as  an  optimization  tool. 

1.  Gradient-Based  Search  Methods  derive  from  traditional  non-linear 
programming;  the  most-often  used  methods  include  finite  difference 
estimation,  infinitesimal  perturbation  analysis,  frequency  domain  analysis, 
and  likelihood  ratio  estimators. 

2.  Stochastic  Approximation  Methods  apply  recursive  search  techniques  that 
converge  on  the  theoretical  minimum  or  maximum. 

3.  Response  Surface  Methodology  fits  a  low-order  polynomial  approximation 
to  the  responses  of  the  simulation,  usually  derived  from  a  formal 
experimental  design. 

4.  Heuristic  Methods  fall  into  two  broad  categories  —  complex  search  and 
simulated  annealing  (Azadivar  1992). 
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Within  this  construct  the  algorithms  described  in  Chapter  3  incorporate  items  (1) 
and  (2)  by  employing  a  recursive  search  technique  referred  to  as  the  Nelder-Mead 
simplex  method  (Nelder  and  Mead  1965),  and  the  projected  gradient  and 
PART  AN  search  methods  discussed  in  Luenberger  (1984).  Chapter  4  develops 
the  techniques  for  fitting  and  analyzing  a  response  surface  to  Z(x)  in  (2.3a)  for  a 
given  solution  x,  while  item  (4)  is  not  used  in  this  thesis. 

Simulation  does  have  some  disadvantages  and  pitfalls.  Summarizing  Law 
and  Kelton  (1991),  these  problems  include: 

1.  Expense.  Simulation  models  of  complex  systems  can  be  costly  and  time- 
consuming  to  develop  and  run. 

2.  Stochastic  Nature.  Although  inherently  stochastic,  simulation  models 
perform  better  when  comparing  alternative  solutions  rather  than  finding 
the  optimal  one. 

3.  Bad  Assumptions.  While  false  assumptions  can  derail  any  modeling 
effort,  simulation  is  especially  vulnerable  to  mistaken  probability 
distributions,  false  presumptions  of  independence,  inaccurate 
identification  of  randomness,  biased  random  number  generation,  and  an 
insufficient  number  of  replications. 

4.  Resolution.  Getting  the  right  level  of  detail  is  important  —  and  difficult. 
If  there  is  too  little  detail,  then  the  model  lacks  validity;  too  much,  and  it 
introduces  extraneous  noise  to  the  response,  becomes  expensive  to  code, 
and  time-consuming  to  run  (Law  and  Kelton  1991). 
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The  process  of  calculating  the  response  surfaces  associated  with  (2.3a)  is  perhaps 
most  susceptible  to  the  drawbacks  of  expense  and  bad  assumptions.  Clearly,  using 
a  repetitive  search  will  be  computationally  more  expensive  than  current  methods; 
however,  this  dissertation  contends  (and  the  research  will  show)  that  the 
additional  information  justifies  the  effort.  As  a  simulation,  the  proposed 
methodology  is  also  vulnerable  to  bad  assumptions,  although  this  drawback 
applies  to  the  most  recent  LP-oriented  approaches  (e.g.,  stochastic  decomposition) 
as  well.  The  research  assumes  that  the  probability  distributions  for  the  test 
problems  in  Chapter  5  are  correct,  and  will  address  the  issues  of  bias  (through 
either  random  number  generation  or  other  sources)  and  insufficient  replications  at 
the  appropriate  point. 

Additional  information  and  details  on  simulation  optimization  techniques 
can  be  found  in  Barton  and  Ivey  (1991),  Biles  and  Swain  (1979),  Box  and  Draper 
(1987),  Fu  (1994),  Jacobson  and  Schruben  (1989),  Kleijnen  (1974,  1987),  Law 
and  Kelton  (1991),  Meketon  (1987),  Pritsker  (1986),  Rubinstein  (1981),  and 
Safizadeh  (1990). 
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Chapter  3 


Methodology:  Optimization 


3.1  Introduction 

3.1.1  Overview 

This  chapter  describes  the  optimization  methodology  for  deriving  the 
response  surface  approximation  of  the  expected  optimal  value  of  the  objective 
function  associated  with  a  class  of  two-stage  stochastic  linear  programming 
problems  with  relatively  complete  and  fixed  recourse.  The  following  sections  of 
this  introduction  review  the  notational  form,  principal  terms,  and  major 
definitions  for  the  remainder  of  the  dissertation;  provide  an  exact  description  of 
the  class  of  recourse  problems  investigated  by  this  research;  and,  present  a  sample 
recourse  problem  used  throughout  Chapters  3  and  4  to  illustrate  the  particular 
techniques.  After  this  introduction,  Sections  3.2  and  3.3  present,  respectively,  two 
major  areas  regarding  optimality  and  algorithmic  efficiency  investigated  by  this 
research;  Non-Linear  Search  Techniques  and  Linear  Programming  Algorithms. 
The  next  chapter  covers  the  statistical  analysis  topics  of  Variance  Reduction, 
Experimental  Design,  Canonical  Analysis,  and  Distribution  Analysis.  Although 
this  chapter  focuses  on  the  computational  aspects  of  deriving  a  response  surface 
approximation  of  the  expected  value  of  a  two-stage  LP  with  recourse,  when 
appropriate  it  also  provides  the  theoretical  background,  literature  references,  and 
the  author's  own  contributions  for  the  area  under  study. 
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3.1.2  Model  Description  and  Basic  Definitions 

For  notational  purposes  matrices  are  shown  in  uppercase  boldface  with  the 
subscript  identifying  the  specific  matrix  (vectors  appear  in  lowercase  boldface). 
Where  a  matrix  is  identified  in  boldface  type,  the  superscripts  label  column  or  row 
vectors;  conversely,  where  an  element  of  a  matrix  appears  in  non-boldface  type 
the  superscripts  describe  the  location  of  the  element  in  the  matrix.  Any  additional 
subscripts  appearing  in  parenthesis  refer  to  that  matrix  as  defined  for  the  equation 
in  the  subscripted  parenthesis.  If  no  subscripted  item  in  parenthesis  appears,  then 
assume  the  matrix  or  element  in  the  context  of  its  most  recent  definition.  The 
superscript  identifies  that  matrix  as  associated  with  an  optimal  solution  for  an 
optimization  problem.  Subscripting  conventions  also  apply  to  functional  notation. 

Example.  G**,-(2)  refers  to  the  k&  row  in  the  i&  version  of  matrix  G  as  that 

matrix  is  defined  in  the  set  of  equations  (2). 

Example.  G‘J  refers  to  the  element  in  the  i&  row  and  j&  column  of  matrix 

G  as  most  recently  defined. 

This  dissertation  restricts  its  research  to  a  specific,  but  important,  category 
of  two-stage  programming  under  uncertainty  with  relatively  complete  and  fixed 
recourse  —  the  capacity  expansion  problem.  In  general,  the  capacity  expansion 
problem  involves  optimization  through  a  first-stage  decision  (x)  regarding  the 
amount  of  production  capacity  to  add,  with  a  follow-up  second-stage  vector  (y) 
typically  deciding  the  optimal  resource  allocation  after  the  realization  of  any 
random  variables.  Examples  of  this  sort  of  problem  from  the  literature  and 
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Internet  include  power  expansion,  machine  capacity  expansion,  and  facility 
location  problems;  and,  almost  always  contains  uncertainty  in  one  or  more 
resource's  availability  or  requirement  (i.e.,  demand,  budget,  physical  limitations). 
Although  the  second-stage  production  problem  possesses  an  infinite  horizon  (i.e., 
periodic  and  recurring),  most  models  assume  (with  appropriate  present- value 
adjustments  in  the  objective  function  coefficients)  that  a  simplifying  single 
second-stage  model  captures  the  essential  behavior  of  a  more  complicated  multi¬ 
stage  expression  of  the  recourse  problem.  Also,  expansion  problems  can  involve 
a  sequence  of  capacity  decisions  over  time,  but  this  research  restricts  its  focus  to  a 
one-time  expansion  decision  formulated  in  the  first-stage. 

Mathematically,  the  two-stage  capacity  expansion  problem  examined  by 
this  research  is  expressed  as  Min  Z(x)  where 

Z(x)  =  cx  +  E[/i(x,co,T)],  s.t. Ax  =  b,  x>0  (3.1a) 

h(x, <o,T)  =  MiNdy,  s.t.  Wy  =  <o  -  Tx,  y>0  (3.1b) 

c  and  d  are  cost  coefficient  vectors  for  a  unit  increase  in  x  and  the  recourse 
decision  y,  respectively;  A  is  the  matrix  of  per  unit  consumption  of  resource  b  by 
x;  W  is  the  matrix  of  per  unit  consumption  of  resource  ©  as  adjusted  by  the  vector 
Tx;  and  function  /i(x,©,T)  is  defined  as  the  recourse  problem  (3.1b). 
Alternatively,  (3.1a)  can  be  expressed  in  a  profit  maximization  form,  where  d 
represents  the  profit  gained  as  offset  by  the  cost  of  expansion  in  c;  however, 
throughout  this  dissertation  the  text  assumes  a  minimization  objective. 
Additionally,  all  formulations  assume  that  a  finite  optimal  solution  exists  for 
(3.1a).  Although  problem  specific,  this  condition  can  be  explicitly  guaranteed  by 


44 


including  a  set  of  error  vectors  to  formally  insure  feasibility,  yet  indicating 
through  their  presence  in  any  optimal  bases  of  constraint  violations  (see  Mulvey 
1993).  While  uncertainty  can  also  occur  in  objective  function  and  constraint 
coefficients,  this  dissertation  follows  the  recent  literature  in  restricting  random 
variation  to  the  right-side  vector  co  -  Tx.  In  terms  of  (3.1a)  and  (3.1b),  all 
coefficients  possess  a  fixed  value  except  ©  and  T,  where  the  model  assumes  that  a 
finite  mean  and  variance  exists  for  each  element.  Finally,  no  assumption  is  made 
on  the  independence  of  the  random  components  of  (O  and  T. 

The  literature  provides  two  important  characteristics  of  Z(x):  (1)  Z(x)  is  a 
convex  linear  function  of  x  (piecewise  convex  linear  function  for  finite 
realizations  of  ©  and  T);  and,  (2)  the  only  true  decision  vector  is  x,  since  y 
deterministically  follows  the  decision  vector  x  and  the  realization  of  the  random 
variables  in  ©  and  T.  Consequently,  a  second-order  polynomial  approximation 
can  be  fit  to  the  true  function  Z(x)  in  the  region  of  optimality  (defined  shortly) 
using  estimates  derived  from  a  Monte  Carlo  simulation.  Indeed,  in  theory  during 
the  course  of  the  simulation  of  Z(x)  several  responses  to  x  could  be  approximated: 

1.  Point  estimates  (and  associated  upper  and  lower  confidence  intervals)  of 
the  first-  and  second-moments  of  the  distribution  describing  /i(x,©,T). 

2.  The  relative  frequency  of  different  bases  of  the  recourse  problem  /z(x,©,T) 
being  optimal. 

3.  By  extension  of  item  (2),  the  estimated  values  of  y  for  (3.  lb). 
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However,  for  several  reasons  this  research  will  fit  a  response  surface  only  to  Z(x). 
First,  the  primary  focus  of  this  work  is  to  establish  — for  the  first  time  —  the 
principle  and  techniques  of  fitting  a  response  surface  to  an  important  aspect  of 
(3.1a).  Second,  this  inquiry  presents  search  techniques  that  rely  heavily  on  the 
convexity  property  of  Z(x);  conversely,  such  characteristics  for  any  other  response 
has  yet  to  be  shown.  Finally,  the  dynamic  and  generally  unknown  nature  of  the 
underlying  distribution  of  h(\, co,T)  strongly  suggests  proceeding  with  a  non- 
parametric  investigation  of  the  region  of  optimality  rather  than  trying  to  fit  a 
response  to  higher-order  moments  (Wilson  1995).  Therefore,  this  dissertation 
leaves  other  single  —  or  multiple  —  response  estimations  for  future  study. 

The  following  definitions  represent  the  primary  terminology  used 
throughout  this  dissertation: 

Definition.  Let  m(3.ia>  and  np.ia)  represent  the  number  of  constraints  and 
variables,  respectively,  for  (3.1a).  Similarly,  let  m( 3.1b)  and  «(3.ib)  denote 
the  row  and  column  dimensions  for  (3.1b). 

Definition.  Let  X=  { x  :  Ax  =  b,  x  >  0} . 

Definition  Let  x*  represent  the  optimal  solution  that  minimizes  Z(x);  i.e., 
Z(x*)  =  Min  Z(x).  Define  region  of  optimality  as  the  set  {x':  Z(x')  <  Z(x*) 
+  8,  e  >  0,  x'  e  X};  i.e.  those  feasible  solutions  x'  whose  objective  values 
Z(x')  are  near-optimal  (as  defined  by  e). 
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Definition.  Let  xev  represent  the  optimal  solution  for  the  expected  value 
approximation  Min{cx  +  &(x,E[(d],E[T])},  s.t.  x  e  X.  (Recall  (2.9)  in 
Section  2.3.1.) 

Definition.  Let  k  be  an  identifying  variable  for  the  iterative  sequence  of 
first-stage  vectors  xi,  x2,  ...  ,  x*,  ...  ,  xK  as  determined  by  a  search 
algorithm  described  in  Section  3.2,  and  let  K  denote  the  total  number  of 
distinct  x  in  the  sequence  where  x^  =  x*  or  x%  =  x'. 

Definition.  Let  z =  cxk  +  fc(x*,©;,T,');  i.e.,  zik  represents  the  objective 
value  given  xk  for  the  realization  of  ©  and  T. 

Definition.  LetZs(x)  represent  an  unbiased  estimator  of  Z(x)  using 
sampling  technique  5.  %{x)  will  be  substituted  for  Z(x)  in  problems  where 
calculating  Z(x)  is  computationally  prohibitive. 

3.1.3  Example  Problem  (APL1P) 

The  recourse  problem  and  formulation  in  Figure  3.1  (provided  by  Morton 
(1995)  and  Infanger  (1994))  describes  the  power  generation  problem  APL1P, 
where  the  first-stage  decision  consists  of  2  variables  xl  (i  =  1,2)  that  model  the 
supply  capacity  of  their  respective  source  nodes  1  and  2  (the  constraints 
associated  with  the  lower  bounds  x*  >  1  correspond  to  the  Ax  =  b  portion  in  3.1a). 
The  oy  reflects  the  stochastic  demand  of  the  destination  node  j  (the  (0  portion  in 
(3.1b)),  while  I;11  and  E?2  represent  variation  in  supply  availability  (these  elements 
correspond  to  the  T  matrix).  Finally,  yv  represents  the  second-stage  (recourse) 
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Figure  3.1 

Formulation  of  APL1P  Problem 


decision  variables  that  minimize  the  cost  of  meeting  the  demand  given  the 
capacity  of  supply  xl  and  realization  of  the  stochastic  demand  (£>J  (their 
technological  coefficients  constitute  W  and  assume  a  transportation  problem 
structure).  Representing  a  standard  characteristic  of  these  problems,  the  recourse 
decision  variables  y3-'  model  the  options  of  purchasing  supply  outside  of  the 
system's  capacity  as  reflected  by  the  higher  unit  costs  d3^';  plus,  with  no  upper 
bound  they  guarantee  a  feasible  solution  for  any  value  x  and  realization  of  co. 
Given  its  small  dimensionality,  APL1P  provides  an  excellent  case  study  for 
graphically  demonstrating  the  proposed  techniques  for  fitting  and  analyzing  a 
response  surface  to  Z(x).  Therefore,  Chapters  3  and  4  will  refer  to  this  problem  as 
needed  for  illustration. 

3.2  Non-Linear  Search  Techniques 
3.2.1  Introduction 

The  objective  of  deriving  a  low-order  polynomial  approximation  to  the 
estimated  response  Z(x)  clearly  requires  experimental  data  on  how  Z(x)  responds 
to  changes  in  x.  Formal  experimental  designs  provide  the  best  method  for 
estimating  such  an  approximation  (Box  and  Draper  1987),  but  ultimately  they 
require  knowing  where  to  center  the  experimental  design,  which  independent 
factors  to  include,  what  levels  or  values  to  set  them  at,  and  what  type  of  design 
structure  to  use.  Furthermore,  the  known  convexity  of  Z(x)  notwithstanding,  the 
potentially  enormous  size  of  X  dictates  that  a  quadratic  approximation  will  best  fit 
only  a  small  portion  of  the  sample  space  (with  the  evident  area  of  interest  being 
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the  region  of  x*).  Finally,  due  to  its  h(x, co,T)  component  Z(x)  itself  must  usually 
be  estimated  except  for  relatively  small  and  discrete  distributions  of  ©  and  T. 
Obviously,  such  experimental  design  information  about  any  recourse  problem  of 
the  form  (3.1)  is  not  known  beforehand  and  must  therefore  be  found. 
Consequently,  this  dissertation  proposes  adapting  several  search  methods  from  the 
simulation  and  non-linear  programming  literature  as  techniques  for  producing 
data  in  a  way  that  allows  the  construction  of  a  formal  experimental  design  within 
the  region  of  optimality,  and  thus  deriving  an  accurate  response  surface 
approximation. 

The  fundamental  idea  of  all  the  proposed  methods  involves  searching  X 
using  an  iterative  sequence  xi,  X2,  ...  ,  x*,  where  Z(xi)  >  Z(X2)  ^  ...  ^  Z(x*),  for 
two  basic  purposes: 

1.  Optimality.  Finding  x*,  Z(x*)  =  Min  Z(x),  is  the  logical  conclusion  of  the 
iterative  search  sequence  Xi,  X2,  ...  ,  x*.  However,  in  moving  beyond  just 
discovering  an  optimal  solution  to  the  next  step  of  deriving  a  response 
surface  approximation  and  characterizing  the  region  of  optimality,  x*  plays  a 
crucial  role  as  the  experimental  design  centerpoint.  Also,  in  the  case  of 
multiple  optimal  solutions,  the  range  of  the  elements  of  n  optimal  vectors  x*  1 , 
x*2,  ...  ,  x *„  provides  guidance  on  how  to  scale  the  elements  of  the 
independent  vector  x  in  the  formal  design. 

2.  Factor  Screening.  As  a  byproduct  of  acquiring  x*,  the  sequence  of  first-stage 
vectors  xi,  X2,  ...  ,  x*,  and  associated  estimated  responses  Z(xO,  Z(X2),  ...  , 
Z(x*),  form  a  data  set  for  the  purpose  of  screening  the  elements  of  x  for 
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significant  effects  on  the  estimated  response  Z(x).  As  Section  4.3  explains, 
such  preliminary  factor  screening  can  help  reduce  the  size  of  the  formal 
experimental  design  used  to  calculate  the  response  approximation. 

The  necessity  of  screening  the  components  of  x  motivates  the  use  of  gradient- 
based  search  techniques  over  LP-based  decomposition  methods.  Indeed,  what  the 
literature  often  views  as  disadvantages  of  such  line-search  techniques  —  slow 
convergence,  sampling  and  comparison  requirements  —  can  prove  advantageous 
(within  reason)  by  providing  ample  experimental  data  to  effectively  reduce  the 
number  of  independent  variables. 

As  reviewed  in  Section  2.3.4  most  search  techniques  find  a  directional 
vector  d*  (not  to  be  confused  with  the  recourse  cost  vector  d  in  (3.1b))  from  an 
incumbent  feasible  solution  x*  such  that  the  following  conditions  hold  for  a 
minimization  objective  of  (3.1): 

x*+i  =  x*  +  p*d*, 

P*>0 

Ax*+1  =  b,  x*+i  >  0 
Z(Xjt+i)  <  Z(x*). 

Note  that  (3.2)  is  equivalent  to  the  projection  operator  often  seen  in  the  stochastic 
programming  literature  (see  Section  2.3.4).  Furthermore,  given  the  known 
convex  nature  of  Z(x),  (3.2)  implies  that  for  any  descending  directional  vector  d* 
of  any  x*  where  Z(x*)  >  Z(x*),  there  exists  an  optimal  p**  such  that  for  any  p/  * 
p**,  Z(x*  +  p*i£k)  <  Z (x*  +  p/d*).  Therefore,  the  three  search  methods  explored  in 


(3.2a) 

(3.2b) 

(3.2c) 

(3.2d) 
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this  research  —  Geometric  Simplex,  Projected  Gradient,  and  PARallel  TANgents 
(PARTAN)  —  must  estimate  two  components:  (1)  the  steepest  directional  descent 
vector  d£  and  (2)  the  optimal  scalar  multiple  pV 


3.2.2  Geometric  Simplex 

3.2.2.1  Introduction  and  Definitions 

The  geometric  simplex  search  this  effort  implements  follows  the  simplex 
search  algorithm  originally  proposed  by  Spendley,  Hext,  and  Himsworth  (1962), 
as  modified  by  Nelder  and  Mead  (1965)  and  Barton  and  Ivey  (1991).  (For  the 
remainder  of  this  dissertation,  the  term  simplex  refers  to  the  geometric  simplex 
search  as  described  in  this  section,  and  should  not  be  confused  with  the  linear 
programming  optimization  algorithm  of  the  same  name.)  As  explained  by  Barton 
and  Ivey 


For  a  function  of  n  parameters,  the  algorithm  maintains  a  set  of  n+1  points 
in  parameter  space.  This  set  of  points  defines  a  simplex  in  n  dimensions. 
In  two  dimensions,  the  simplex  would  be  a  triangle;  in  three,  a  tetrahedron. 
The  Spendley  et.  al.  algorithm  incorporates  a  regular  simplex  (i.e.,  all 
sides  have  the  same  length)  which  does  not  vary  in  size.  The  function  is 
evaluated  at  each  point  of  the  simplex.  The  simplex  then  moves  toward 
the  optimum  by  reflecting  the  point  with  the  worst  function  value  through 
the  centroid  (average)  of  the  remaining  n  points.  In  two  dimensions,  this 
can  be  visualized  as  flipping  over  a  triangle  to  move  it  down  a  hill  (Barton 
and  Ivey  1991, 945). 


Barton  and  Ivey  show  how  Nelder-Mead  modifies  the  Spendley  et  al.  approach  by 
allowing  the  shape  of  the  simplex  to  change  (thus  allowing  for  quicker 
convergence).  In  terms  of  (3.2),  this  means  x*  represents  the  vertex  with  the 
highest  objective  value  in  the  current  simplex,  and  its  reflection  through  the 
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centroid  of  the  remaining  points  determines  the  directional  vector  of  descent  d*. 
The  simplex  converges  towards  x*  by  iteratively  replacing  x k  with  a  new  solution 
Xfc+i  found  along  the  directional  vector  whenever  Z(xfc+i)  <  Z (x*),  and  stops 
when  meeting  a  predetermined  termination  criteria.  Referring  to  Figure  3.2 
below,  these  simplex  moves  include: 

1.  Reflection.  The  reflection  vector  extends  beyond  the  centroid  in  the  direction 
dfc  to  a  candidate  point  xr  outside  the  boundaries  of  the  current  simplex. 

2.  Expansion.  If  Z(xr)  <  Z(xk),  the  algorithm  finds  another  candidate  point  xe 
further  away  from  the  simplex  in  the  direction  dk,  and  compares  Z(xr)  to 
Z(xe). 


xi 


X2 


Reflection 


xi 


xi 


X2 

Contraction 


Shrinkage 


Fig.  3.2.  Geometric  Simplex  Moves 
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3.  Contraction.  If  the  reflection  point  Z(xr)  >  Z(x*),  the  algorithm  finds  another 
candidate  point  xc  (using  djt)  either  closer  to  the  centroid,  or  within  the 
simplex  boundaries,  and  compares  Z(xc)  to  Z(x*). 

4.  Shrinkage.  If  both  the  reflection  point  Z(xr)  >  Z(x*)  and  the  contraction  point 
Z(xc)  >  Z(xk),  then  the  algorithm  shrinks  the  simplex  by  moving  its  points 
closer  to  the  best  vertex,  while  keeping  the  best  vertex  constant  (Barton  and 
Ivey  1991). 

Since  preliminary  research  indicates  that  item  (4)  often  causes  a  premature 
collapse  of  the  simplex  when  used  on  (3.1a)  this  study  adds  a  fifth  option: 

5.  Enlargement.  Expand  the  simplex  by  extending  its  points  through  the  best 
vertex  to  a  greater  distance  on  the  other  side  (while  keeping  the  best  vertex 
constant).  In  effect,  flipping  and  expanding  the  simplex  about  its  best  point. 

Item  (5)  also  has  the  added  benefit  of  providing  additional  sampling  of  the 
parameter  space  needed  for  preliminary  factor  screening.  Finally,  as  explained 
shortly  the  simplex  search  this  dissertation  implements  restricts  itself  to  two 
entering  candidate  evaluations:  an  interior  point  xc  halfway  between  the  leaving 
candidate  x*  and  the  centroid;  and,  an  exterior  point  xe  outside  the  simplex  half¬ 
way  to  a  boundary  constraint  defined  by  X. 

Definition.  Let  a  point  or  vertex  in  the  simplex  be  a  feasible  vector  x  of 
full  dimension  n(3.ia)  and  Z(x)  the  objective  function  value  as  defined  in 
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(3.1) .  Defining  I  -  np.ia)  +  1  to  be  the  number  of  vertices  in  the  simplex, 
let  i  index  x*  such  that  Z(xO  <  Z(X2)  <  . . .  <  Z(x,)  <  . . .  <  Z(x/). 

Definition.  Let  Xk  =  {x*:  x,-  e  X,i  =  1, ...  ,  /}  represent  the  k&  simplex  for 

(3.1) ,  and  by  previous  definition  Xk  c  X.  Furthermore,  letting  x,-f* 
represent  x,-  e  Xk,  then  for  any  entering  candidate  Xy^+i  £  Xk  and  leaving 
candidate  xi>k  £  Xk+\,  Z(x/-;fc+i)  <  Z (x, 

The  last  definition  states  that  the  objective  function  value  for  the  entering 
vertex  must  be  better  than  the  next  best  vertex  value  from  the  leaving  candidate  in 
the  current  simplex.  This  comparison  helps  prevent  the  search  from  stalling,  as 
well  as  avoiding  using  the  best  vertex  (xO  as  a  leaving  candidate. 

The  simplex  search  algorithm  can  now  be  stated  using  these  definitions: 

Definition.  Let  ciik  represent  the  centroid  when  xiik  is  the  leaving  vertex  in 
simplex  Xk-  Define  c^*  =  (7  - 1)'1  (x/j^  +  x^,*  +  . . .  +  +  ^i+i,k  +  •  •  •  + 

xJjf)  for  j  =  1,  ...  ,  no.ia)-  Since  c ^  is  a  convex  combination  of  simplex 
vertices,  e  X. 

Definition.  Let  represent  the  reflection  vector  when  x^  is  the  leaving 
vertex  in  simplex  Xk  and  c ^  is  its  associated  centroid.  Define  ri  k  =  ci(*  - 
Xj'k-  Since  c ^  €  X  and  x,-  ^  e  X,  Ar^  =  Ac*,*  -  Ax,-^  =  b  -  b  =  0. 

Definition.  Let  p k  represent  a  scalar  multiple  of  the  reflection  vector  riik 
such  that  the  entering  vector  Xj,k+i  =  Ctf  +  p*r,>  and  p*  ^  0. 
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Definition.  Let  R*  =  Min 


rtik 


rJik 


for 7  =  1  . . .  n(3.ia)  and  %  <  o|.  Since 


Ar^jt  =  0,  the  entering  vector  x^+i  will  be  feasible  to  Ax  =  b  for  any  value 
of  p*.  However,  since  x  >  0  and  p*  >  0,  the  only  way  any  element  of  x^+i 
can  be  less  than  zero  is  if  r-^*  <  0.  Therefore,  represents  the  maximum 
scalar  multiple  of  such  that  x^+i  e  X,  i.e.,  p*  <  R*. 


3.2.1.2  Geometric  Simplex  Algorithm 

Step  0.  (Initialization).  Establish  the  first  simplex  vertex  using  the 
optimal  solution  xev  from  the  expected  value  approximation.  Randomly 
select  remaining  7-1  vertices  labeled  xo  for  initial  simplex  X\.  Estimate 
Z(x,)  and  re-index  on  i  using  the  relationship  Z(xi)  <  Z(x2)  <  ...  <  Z(x,)  < 
...  <  Z(x/).  Set  simplex  counter  index  k  =  1  and  i  =  7.  Set  N  (the 
terminating  number  of  vertices)  and  vertex  counter  n  =  1. 

Step  1.  If  n  >  77  Stop.  Otherwise,  find  c,^  and  r,-,*.  Calculate  two 
entering  candidates:  xc  =  c -  .5^,^  and  xe  =  +  .5Rkrs^.  Compare 

Z(Xj.ik)  to  Min  [Z(xe),  Z(xc)}  using  the  following  guidelines. 

If  Min{Z(xc),  Z(xe)}  <  Z(x,.i)fc)  go  to  Step  2. 

If  Min{Z(xc),  Z(xe) }  >  Z(Xi.itk)  go  to  Step  3. 

Step  2.  Replace  Z(x^)  with  Min  { Z(xc),  Z(xe) } ,  and  x^  with  appropriate 
xe  or  xc.  Set  k  =  k  +  1.  Re-index  on  i  using  the  relationship  Z(xi)  <  Z(x2> 
<  . . .  <  Z(x,)  <  . . .  <  Z(x/).  Set  i  =  7,  and  increment  n  =  n  +1 .  Go  to  Step  1 . 

Step  3.  Set  i  =  i  - 1.  If  i  =  1  go  to  Step  4;  otherwise,  go  to  Step  1. 
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Step  4.  Based  on  the  number  of  previous  visits  to  this  step  do  the 
following: 

Shrinkage.  Shrink  the  simplex  by  contracting  X2,  ...  ,x/  to  one-half  their 
current  respective  distance  from  xi  if  first  or  third  visit  to  Step  4. 
Enlargement.  Enlarge  the  simplex  by  moving  X2,  ...  ,X/  through  Xi  to 
nine-tenths  their  current  respective  distance  to  the  opposing  feasible 
boundary  if  second  visit  to  Step  4. 

Re-initialization.  Otherwise  re-initialize  simplex  by  following  the 
procedures  in  Step  0  except :  (1)  Retain  current  xj  instead  of  xev\  and,  (2) 
Retain  current  value  for  n.  Go  to  Step  1. 

3.2.2.3  Implementation 

The  termination  criteria  of  a  preset  number  of  vertices  highlights  the  major 
drawback  of  the  simplex  search  —  its  inability  to  confirm  an  optimal  solution. 
Instead,  it  must  assume  an  optimal  solution  has  been  found  through  other 
methods,  such  as  an  absence  of  any  improving  moves  or  reaching  a  predetermined 
standard  error  of  the  response  estimate  (Barton  and  Ivey  1991).  In  theory,  the 
stopping  criteria  would  concur  with  Z(xj =  Z(x*);  however,  in  practice  the 
variation  of  the  response  estimator  Zs(x)  can  cause  false  convergence.  Quoting 
Barton  and  Ivey  again 

The  Nelder-Mead  algorithm  is  widely  used  for  simulation  optimization, 
where  the  functions  it  optimizes  are  often  subject  to  random  noise.  The 
algorithm  is  robust  to  small  inaccuracies  or  stochastic  perturbations  in 
function  values.  This  is  because  the  method  uses  only  the  ranks  of  the 
function  values  to  determine  the  next  move,  not  the  function  values 
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themselves.  Perturbations  that  do  not  change  the  ranks  of  the  values  will 
have  no  effect  on  the  algorithm's  search  trajectory. 

If  this  noise  is  substantial,  it  will  lead  to  inappropriate  rescaling 
operations,  resulting  in  false  convergence.  Empirically,  this  problem  often 
manifests  itself  as  inappropriate  shrink  steps.  Once  begun,  this  reduction 
in  the  simplex  size  can  reduce  the  variance  of  the  simplex  function  values 
below  the  system's  inherent  variability  before  the  optimum  region  has 
been  reached  (Barton  and  Ivey  1991, 946-947). 


Unfortunately,  preliminary  tests  showed  that  the  variance  of  /i(x,co,T)  can  indeed 
be  substantial,  and  thus  adversely  affect  the  ordinal  ranking  of  Z ,(xf)  the  simplex 
method  depends  upon  to  find  d*  and  p*.  However,  these  tests  also  indicate  that 
the  simplex  search  finds  the  region  of  optimality  fairly  quickly  if,  as  in  Step  4,  it 
avoids  a  premature  collapse  through  (1)  periodic  enlargement  and  (2)  non- 
repetitive  shrinkage. 

The  termination  criteria,  and  ultimately  the  simplex  technique  itself, 
assumes  that  the  scope  of  the  search  will  provide  enough  information  about  the 
region  of  optimality  to  insure  a  positive  definite  fit  for  the  final  response  surface 
approximation  of  Z(x).  This  procedure  also  assumes  that  the  subsequent 
investigation  of  the  minimum  ridge  estimates  (see  Section  4.4)  will  further 
characterize  the  region  of  optimality,  and  identify  any  improvements  over  the 
incumbent  solution.  Finding  the  optimal  solution  in  this  manner  follows  the  main 
idea  of  steepest  descent  methods  in  the  response  surface  literature,  except  the 
simplex  method  has  been  substituted  for  the  first-order  designs  (see  Montgomery 
1984,  or  Box  and  Draper  1987).  Finally,  variance  reduction  techniques  explored 
in  Section  4.2  may  reduce  the  variation  of  /i(x,co,T)  to  the  "...small  inaccuracies  or 
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stochastic  perturbations  in  function  values..."  Barton  and  Ivey  claim  as  acceptable 
for  the  simplex  search. 

3.2.2 A  Example  Simplex  Search  for  APL1P 

Figure  3.3  illustrates  the  first  two  iterations  of  a  20-iteration  simplex 
search  path  for  the  APL1P  problem,  and  highlights  the  initial  simplex  shape  and 
its  form  after  the  entry  of  x2  (Table  3.1  provides  the  simplex  data  for  all  20 
iterations).  As  a  two-dimensional  space,  APL1P  requires  three  vertices,  with  the 
initial  simplex  (outlined  by  the  upper  triangle)  using  the  expected  value 
approximation  solution  (xev)  and  two  randomly  selected  ones  (xo).  The  first 
iteration  replaces  the  most  expensive  xo  with  xi  through  an  expansion  move, 
while  the  second  iteration  candidate  x2  replaces  the  other  xo  with  a  contraction 
move.  After  two  moves  the  simplex  X2  (outlined  by  the  lower  triangle)  is 
significantly  closer  to  the  region  of  optimality.  The  simplex  continues  to  contract 
until  undergoing  a  shrink-enlarge-shrink  cycle  after  iteration  17.  After  two 
additional  contractions,  the  simplex  re-initializes  and  undergoes  one  additional 
contraction  before  terminating.  The  final  near-optimal  solution  x'  =  X19  = 
(1708, 1685)  and  Z(x')  =  24647.49,  compared  to  x*  =  (1800,  1570)  and  Z(x*)  = 
24642.29. 
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Figure  3.3.  Example  Geometric  Simplex  Moves  for  APL1P 


3.2.3  Projected  Gradient 

3.2.3.1  Introduction  and  Definitions 

The  relationship  described  in  (3.2)  requires  three  basic  operations  at  each 
iteration:  (1)  finding  a  directional  vector  d*  satisfying  Ad*  =  0;  (2)  deriving  an 
optimal  p*k  such  that  for  any  p j  #  p\,  Z(x*  +  p**djt)  <  Z(xjt  +  p7d^);  and,  (3) 
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Table  3.1 

Example  Geometric  Simplex  Moves  for  APL1P 
_ (Random  Seed  =  440908571) _ 


k 

X 

Z(x) 

Simplex  Move 

Replaces 

0 

1529, 1625 

24698.47 

Initial  Vertex  (xev) 

— 

0 

2437,2680 

26824.78 

Initial  Vertex  (Random  Pick) 

— 

0 

2502,  3462 

28756.62 

Initial  Vertex  (Random  Pick) 

— 

1 

1557, 1077 

25216.05 

Expand  xe 

3 

2 

1990, 2016 

24997.72 

Contract  xc 

3 

3 

1658, 1449 

24731.60 

Contract  xc 

3 

4 

1792, 1776 

24676.73 

Contract  xc 

3 

5 

1659,  1575 

24670.40 

Contract  xc 

3 

6 

1623, 1650 

24661.63 

Contract  xc 

3 

7 

1718, 1694 

24647.60 

Contract  xc 

3 

8 

1666, 1624 

24657.75 

Contract  xc 

3 

9 

1660, 1655 

24653.83 

Contract  xc 

3 

10 

1677, 1649 

24651.15 

Contract  xc 

3 

11 

1679,  1663 

24649.44 

Contract  xc 

3 

12 

1688, 1664 

24648.67 

Contract  xc 

3 

13 

1691, 1672 

24648.20 

Contract  xc 

3 

14 

1696,  1673 

24647.70 

Contract  xc 

3 

15 

1699, 1678 

24647.69 

Contract  xc 

3 

16 

1702,  1680 

24647.59 

Contract  xc 

3 

17 

1709, 1689 

24647.50 

Contract  xc 

3 

0 

_ 

— 

Shrink 

— 

0 

— 

— 

Enlarge 

— 

0 

— 

— 

Shrink 

— 

18 

1707,1685 

24647.53 

Contract  xc 

3 

19 

1708, 1685 

24647.49 

Contract  xc 

3 

0 

1708,  1685 

24647.49 

Best  Vertex  from  Prev.  Simplex 

— 

0 

1722, 2361 

25134.60 

New  Vertex  (Random  Pick) 

— 

0 

1176, 1015 

25727.43 

New  Vertex  (Random  Pick) 

— 

20 

1445, 1519 

24792.78 

Contract  xc 

3 

recognizing  when  Z(x*  +  p  *kdk)  =  Z(x*)  =  Min  Z(x).  The  projected  gradient 
method  adopted  by  this  study  accomplishes  these  operations  through  a  synthesis 
of  the  following  ideas  from  the  literature: 

1.  Using  a  concept  first  explored  by  Ermoliev  (1983,  1988)  within  the 
context  of  the  recourse  problem,  the  dual  variables  provide  an 
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unconstrained  steepest  descent  gradient  VZ(x*)  for  the  first-stage  variables 
x.  (Also  see  Murty  1983  for  the  theoretical  LP  background  on  dual 
variables  as  gradients.) 

2.  Gaivoronski's  (1988)  averaging,  or  statistical  estimation,  technique  for 
cases  where  the  estimate  of  the  unconstrained  steepest  descent  gradient  is 
a  function  of  a  random  variable;  i.e.,  VZ(x,co,T). 

3.  Active  set  methods,  whereby  any  'true'  equality  constraints  (i.e.,  no  slack 
variables  present),  plus  any  inequality  constraints  whose  slack  variables 
currently  equal  0,  define  the  'working  surface'  upon  which  a  projected 
steepest  descent  gradient  VZ(xj 0  produces  a  direction  vector  d*  such  that 

Ad*  =  0. 

4.  Estimating  the  stepsize  variable  p*  by  fitting  a  quadratic  model  along  the 
direction  of  descent  defined  by  d*  using  standard  linear  regression  (Fu 
1994,  Luenberger  1989). 

Applying  steepest  descent  methods  from  the  non-linear  programming 
literature  to  the  class  of  problems  (3.1)  describes  comes  with  several  caveats. 
First,  in  a  deterministic  setting  active  set  methods  can  determine  whether  x^  is 
optimal,  or  otherwise  provide  active  constraint  relaxation  guidelines  to  continue 
the  search  for  x*  under  the  Kuhn-Tucker  Theorem  (Luenberger  1989).  By 
contrast,  the  statistical  estimation  (when  necessary)  of  the  unconstrained  descent 
gradient  VZ(Xfc,co,T)  implies  the  presence  of  error  similar  to  that  associated  with 
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Zy(x).  Second,  the  non-differentiable  property  of  E[/i(x,oo,T)]  suggests  that  — 
barring  multiple  optimality  —  VZ(x^)  will  never  equal  zero.  Consequently, 
terminating  the  projected  gradient  algorithm  will  require  the  heuristic  stopping 
rules  proposed  shortly. 

The  following  definitions  explain  the  terminology  of  the  Projected 
Gradient  Algorithm. 

Definition.  For  the  vector  x*  e  En(3Aa)  let  Ak  represent  the  matrix 
composed  of  the  active  rows  from  Ax  =  b,  and  rows  from  x  >  0  where  x'k 
=  0,1=1 . «(3.1a)- 

Definition.  Let  Vz(x*,C0;,T(')  be  the  unconstrained  gradient  for  the 
realization  of  x*. 

Definition.  Let  VZy(x^)  represent  the  unbiased  estimate  of  VZ(x*)  for 
sampling  technique  s. 

Definition.  Let  dk  represent  a  feasible  direction  of  improvement;  i.e. 
VZs(xk)dk  <  0  and  Ad*  =  0. 

Definition.  Let  J*  be  the  projection  matrix  where  d*  =  -J*  •  VZs(xk). 

Definition.  Let  P*  represent  the  minimum  scalar  value  of  dk  at  which  a 
non-active  non-negativity  constraint  xJ  >  0 ,  x)  £  Ak,  goes  to  0;  i.e.,  the 
vector  Xfc+i  =  xk  +  pi£k  such  that  xk+i  e  X  only  when  0  <  p*  <  P*. 
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Definition.  Let  y  e  En(3Ja>  be  the  stopping  criteria  for  projected  gradient 
search. 

Definition.  Let  Q  represent  the  number  of  independent  estimates  of  Z(x) 
taken  at  equal  distances  along  the  vector  starting  at  xk  in  the  direction  of  d* 
and  ending  at  xk  +  P*d*.  Then  where  qi  =  ( Q  -  l)'1 '  (i  - 1),  i=  l, ,  Q,  let 
Xk,qi  =  *k  +  qPkdb  and  Z(xktqi)  =  Z(xk  +  qPiAk). 

Definition.  LetZ u^xk,q)  =  Z(xk  +  qPiAk)  +  %LN  and  ZQo(xk,q)  =  Z(xk  + 
qPkdb)  +  £k,QD  represent  first-  and  second-order  polynomial 
approximations,  respectively,  of  Z(xk  +  ^P^d*)  as  functions  of  xk  and  q,  0 
<q<  1. 

3. 2.3.2  Projected  Gradient  Algorithm 

Step  0.  ( Initialization )  Set  k  =  1  and  n  =  1.  Select  xk.  Select  stopping 

criteria  y  and  maximum  number  of  iterations  N.  Construct  active  set 
matrix  Abusing  row  vectors  from  Ax  =  b,  and  from  x  >  0  for  any  xlk  =  0. 
Estimate  and  record  Z(xk). 

StepI.  Calculate  VZf(xfe).  Find  d^by  projecting  V%(xk)  onto  Ak  using 
projection  matrix  3k.  If  d*  <  lyl  or  n  >  N,  Stop.  Otherwise,  find  P*,  select 
Q,  estimate  and  record  Z(xk>qi),  i  =  1, ... ,  12- 

Step  2.  Derive  ZLN{xhq)  and  Z QD(xhq).  Select  q*  under  the  following 
guidelines: 
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Quadratic  Significance.  If  ZQD(xk,q)  significant  find  q*  by  setting 
derivative  of  Z QD^bf)  with  respect  to  q  equal  to  zero  and  solve.  If  q*  <  0 
set  $*  =  .01.  If  $*  >  1,  set  q*  =  1.0. 

Linear  Significance.  If  Zufxk,q)  has  a  negative  slope  set  q*  =  1.0; 
otherwise  set  q*  =  .01. 

Neither  Fit  Significant.  Set  q  -  .01 

Step  3.  Set  x*+i  =xk  +  q*PiAb  A*+i  =  A*,  and  J^+i  =  Jk.  Set  k  =  k  +  1. 
Estimate  Z(x*)  and  xk.  Check  xk  for  either  (1)  x‘k  >  0  where  x**_i  =  0  or  (2) 
x1^  =  0  where  x^i  >  0.  If  (1)  occurs  remove  x**  from  Ak;  if  (2)  occurs  add 
xlk  to  Ak.  If  either  (1)  or  (2)  occur  (i.e.,  Ak  ^  recalculate  P/..  Return 
to  Step  1. 

3.2.3.3  Theoretical  Development  and  Implementation 

The  structure  of  (3.1)  requires  that  the  respective  dual  variable  information 
for  the  x  variables  must  come  from  the  recourse  problem  (3.1b).  Given  the 
following  primal  (3.3a)  and  dual  (3.3b)  formulations 

h(x,(£>,T)  =  Min  dy 

s.t.  Tx  +  Wy  =  to 

y  >  0  (3.3a) 

h(x, co,T)  =  Max  Jt(co  -  Tx) 

s.t.  TtW  <  d 

Tt  Unrestricted  (3.3b) 
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Ermoliev  (1983, 1988)  shows  that  for  the  i&  realization  of  ©  and  T 

-Vz ik  =  c  +  Kile T  (3.4) 

and  for  x*  the  unbiased  estimate  of  the  unconstrained  steepest  descent  gradient  for 
sample  size  N  is 

-VZ,On)=ix-Vza.  (3.5) 

iV  i=l 

Unfortunately,  A[-VZ5(xU]  may  not  equal  0;  therefore,  a  directional  vector  dk  that 
remains  feasible  beyond  just  the  tangential  point  of  the  constraints  to  Z(x)  must  be 
found  by  projecting  -V£,(xU  onto  the  set  of  active  constraints.  This  active  set 
method  of  gradient  projection  follows  the  description  in  Luenberger  (1989)  as 
originally  suggested  by  Rosen  (1960)  and  Gill,  Murray,  and  Wright  (1981). 

From  the  definition  of  a  plane,  the  row  A1'**  of  A*  is  a  normal  vector  to  the 
subspace  defined  by  the  constraint.  Therefore,  any  vector  in  the  Euclidean  space 
j?n(3.la)  (specifically  -VZs(xk))  can  be  defined  as  a  linear  combination  of  d*  and  the 
rows  in  Ak.  The  projection  matrix  J  can  then  be  derived  starting  with 

-VZy(Xjt)  =  dk+  (A*)TA,jfc.  (3.6) 

Multiplying  (3.6)  by  Ak  and  using  A*dfc  =  0  gives 

-AjfcVZ^x*)  =  Akdk  +  AiAk)Th  =  A^Ak)TXk  (3 .7) 

and  multiplying  (3.7)  by  [A^)(AUT]_1  reduces  to 

^^[A^Tj-iAfcVZ^Xjfc).  (3.8) 
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Substituting  Xk  from  (3.8)  into  (3.6)  gives 

d*  =  -J*VZ,(x*)  (3-9) 

where  the  projection  matrix  J*  is  defined  as 

I  -  (A*)T[A*(A*)t]-1A *.  (3-10) 

Accordingly,  for  each  x*  a  directional  vector  d*  (where  Ad*  =  0)  can  be  found 
using  the  unconstrained  gradient  estimate  -VZy(x*),  and  the  projection  matrix  J* 
(3.10)  based  on  the  current  active  matrix  A*,  with  the  relationship  (3.9) 
(Luenberger  1989). 

The  directional  vector  d*  guarantees  that  the  constraints  in  the  current 
active  set  A*  will  not  be  violated.  Specifically,  for  any  p*  >  0  the  vector  x*+i  =  x* 
+  p*d*  will  be  feasible  to  Ax  =  b.  However,  following  an  identical  argument  from 
the  Geometric  Simplex  Algorithm,  since  x  >  0  and  p**  >  0  the  only  way  any 
x**+i  can  be  infeasible  is  if  d'*  <  0  (since  x'*+i  =  x'*  +  p*d'*  <  0).  Therefore,  it 
follows  directly  that 

P*  =  Min  |  for  i  =  1  . ..  np.ia)  and  d'*  <  oj.  (3.11) 

Furthermore,  given  the  known  convexity  of  Z(x),  there  exists  an  optimal 
multiplier  0  <  p**  <  P*  for  d*  such  that  for  any  0  <  Pj  <  P*,  pj  *  p**,  Z(x*  +  p**d*) 
<  Z(x*  +  p/d*).  The  literature  refers  to  finding  this  optimal  p**  as  the  stepsize 
problem,  which  the  following  section  addresses. 
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Fu  (1994),  in  the  context  of  response  surface  sequential  search  (also  see 
Box  and  Draper  1987),  suggests  solving  the  step-size  problem  by  formulating  a 
line  search  defined  by  the  descent  gradient  as  a  one-dimensional  optimization 
problem  and  fitting  a  second-order  polynomial.  This  approach  offers  a  special 
appeal  for  finding  Min  Z(x)  due  to  Z(x)'s  known  convexity  and  global  optimal 
characteristics;  and,  partially  addresses  concerns  expressed  in  the  literature  over 
selecting  inefficiently  small  or  large  step-sizes  (Sivazlian  and  Stanfel  1975, 
Luenberger  1989).  Therefore,  this  research  extends  Fu’s  idea  by  testing  for 
quadratic  significance  versus  a  first-order  fit  and  selecting  the  best  step-size 
within  the  constraints  defined  by  X. 

The  basic  idea  involves  deriving  a  linear  and  quadratic  approximation  of 
Z(Xfc)  as  a  function  of  q;  i.e., 

ZiAx-bq)  =  Po  +  Pi<?  +  Zk,LN  (3-12) 

and 

ZgD(xfc>^)  =  Po  +  Pi<7  +  P2 q2  +  Zk,QD  (3.13) 

where  0  <  q  <  1,  based  on  data  derived  from  an  equidistant  sampling  of  the 
directional  vector  d*  from  the  incumbent  solution  x*  to  the  bounds  defined  by  x  > 
0.  For  example,  if  Q  =  6,  then  the  line  segment  starting  at  xk  and  ending  at  xk  + 
Pfcd*  will  be  sampled  at  intervals  of  0.0,  0.2,  0.4,  0.6,  0.8,  and  1.0  multiples  of  P& 
and  (3.12)  and  (3.13)  derived  using  qi  as  the  independent  variable  and  Z(xkqi)  as 
the  dependent  variable. 
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The  algorithm  allows  for  several  subjective  interpretations.  First,  the 
selection  of  Q  in  Step  1  involves  a  trade-off  between  more  accurate  regression 
estimates  on  the  one  hand,  and  additional  computational  time  on  the  other.  This 
research  initially  uses  a  standard  factor  of  Q  =  6  as  a  compromise,  although  the 
program  code  allows  for  Q  <  10.  Second,  as  in  most  regression  studies  the  term 
'significant'  depends  on  the  views  of  the  analyst  and  the  context  of  the  analysis 
(see  Draper  and  Smith  1981).  Preliminary  research  indicates  that  the  combined 
effects  of  the  variance  of  h(x, co,T)  and  the  'flatness'  of  the  region  of  optimality  can 
create  lack  of  fit  results  near  constraint  boundaries  or  the  optimal  point. 
Consequently,  requiring  too  high  a  fit  would  significantly  lengthen  the  search 
process.  Therefore,  'significance'  in  this  study  constitutes  an  R-Square  fit  greater 
than  0.9.  Third,  Step  3  tracks  the  changes  in  x*  to  avoid  recalculating  the 
projection  matrix  when  there  is  no  need  to  do  so.  Fourth,  depending  on  its 
location  in  X  the  algorithm's  quadratic  approximation  in  Step  3  can  easily  decide 
on  either  a  negative  value  of  q*,  or  one  greater  than  1.0  —  either  case  an 
obviously  false  estimate  assuming  the  correct  directional  gradient.  The  former 
can  occur  as  the  search  approaches  x*,  whereas  the  latter  typically  would  happen 
whenever  Z(x^)  »  Z(x*).  These  conditions  would  thus  suggest  using  the 
predetermined  increments  0.01  and  1.0,  respectively. 

Finally,  a  word  about  the  termination  criteria.  Recognizing  the  limitations 
of  estimating  both  Z(x)  and  VZ(xfc)  by  using  pre-determined  stopping  criteria  y 
and  N  does  not  detract  from  stopping  at  a  near-optimal  solution  x'.  Recalling  the 
principal  objective  of  deriving  a  response  surface  approximation  of  Z(x)  through 
experimental  design,  Z(x')  or  Z(x*)  represents  the  starting  point  of  the  process. 
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Ermoliev  (1988)  emphasizes  this  practical  viewpoint  from  the  context  of  finding 
the  optimal  solution  using  stochastic  gradient  methods;  thus,  it  follows  that  by 
conducting  this  search  within  the  more  practical  context  of  response  surface 
approximation,  terminating  the  search  with  a  near-optimal  solution  is  justified. 

3.2.3.4  Example  Projected  Gradient  for  APL1P 

Figure  3.4  (based  on  Table  3.2  below)  provides  a  graphic  illustration  of 
the  Projected  Gradient  Algorithm  applied  to  the  APL1P  problem.  Starting 
with  an  initial  solution  Xi  =  (3600,3600),  the  algorithm  very  quickly  moves  to  the 
region  of  optimality  (as  compared  with  the  Geometric  Simplex  Algorithm),  but 
slows  appreciably  afterwards  in  finding  the  optimal  point.  This  slowdown  most 
likely  results  from  less  accurate  estimates  of  q*  provided  by  the  quadratic 
estimates  of  a  relatively  flat  region  in  the  direction  of  least  sensitivity.  Such 
inaccuracy  manifests  itself  by  the  small  negative  estimates  of  q*  for  X4  -  xio;  in 


Table  3.2 

Example  Projected  Gradient  Iterations  for  APL1P 
_ (Scenarios  =  1280,  Q  =  6) _ 


k 

n 

Est.  q* 

Act.  Q 

R2 

Z(x)t _ 

1 

3600,  3600 

3.28, 2.42 

.613 

.613 

.990 

32224.4 

2 

1394, 1969 

-.094,.  105 

.079 

.079 

.995 

24693.1 

3 

1532, 1815 

-.103,  .052 

.044 

.044 

.999 

24667.7 

4 

1623, 1769 

-.048,.  104 

-.158 

.010* 

.995 

24657.1 

5 

1632, 1751 

-.050,  .100 

-.168 

.010* 

.995 

24654.9 

6 

1640, 1733 

-.050,  .097 

-.185 

.010* 

.995 

24652.7 

7 

1649, 1716 

-.081,-069 

-.022 

.010* 

.999 

24650.7 

8 

1669, 1700 

-.068,  -.007 

-.183 

.010* 

.998 

24649.3 

9 

1688, 1701 

-.035,  .022 

-.019 

.010* 

.999 

24648.6 

10 

1707, 1689 

-.035,  .022 

4- _ ^  ^  « 

— 

— 

— 

24647.6 

*  -  Manual  Input  ^  -  Z(x*)  =  24642.3,  x*  =  (1800, 1570) 
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Figure  3.4.  Example  Projected  Gradient  Iterations  for  APL1P  (X4  -  X9  Omitted) 


these  cases  the  estimated  quadratic  fit  places  the  minimum  point  behind  the 
direction  of  descent  (and  thus  requiring  a  .01  input  for  the  stepsize).  This 
phenomenon  most  likely  occurs  due  to  the  incumbent  x*  being  so  close  to  the 
optimal  solution  that  most  (or  all)  samplings  of  the  directional  vector  d*  are 
greater  than  Z(x&),  thus  forcing  a  least-squares  estimate  of  a  curve  with  minimal 
sampling  in  the  opposite  direction.  Nonetheless,  the  algorithm  provides  accurate 
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directional  information  even  as  the  stepsize  problem  prevents  a  quicker 
convergence. 

3.2.4  Parallel  Tangents  (PARTAN) 

3.2.4.1  Introduction  and  Definitions 

Both  Luenberger  (1989)  and  Sivazlian  and  Stanfel  (1975)  discuss  the 
special  PARTAN  procedure  adapted  by  this  dissertation.  As  shown  in  Figure  3.5, 
the  basic  idea  involves  finding  a  point  p*  in  the  x  parameter  space  using  a  steepest 
descent  technique  (in  this  case  the  Projected  Gradient  Algorithm)  from  xk 
such  that  Z(pfc)  <  Z(x*).  Then,  x*+i  is  found  by  minimizing  Z(x)  along  the  line 
defined  by  x*.i  and  p*.  Furthermore,  at  each  iteration  xk  is  checked  for  optimality 
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as  it  would  be  under  the  Projected  Gradient  Algorithm,  including  any 
adjustment  to  its  active  matrix  constraint  A*.  The  formal  definitions  and 
algorithm  are  described  below. 


Definition.  Let  p*  be  the  next  point  in  a  Projected  Gradient  Algorithm 
with  respect  to  x*;  i.e.,  p*  =  x*  +  q*, tP*dfc. 


Definition.  Let  t*  represent  the  vector  defined  by  p*  and  x^-i;  i.e.,  tfc  =  p*  - 
Xjfc-l- 


Definition.  Let  =  Min 


t lk 


for  i  =  1  ...  n( 3.ia)  and  d'*  <  o|. 


Definition.  Let  R  represent  the  number  of  independent  estimates  of  Z(x) 
taken  along  the  vector  starting  at  x*_i  in  the  direction  of  t*  and  ending  at  p* 
+  <|)fctjfc.  Then  where  r,-  =  {R  -  l)’1  •  (i  -  1),  i  =  1,  ...  ,  R,  let  x^+i >n-  =  x^j  + 
r A- 


Definition.  Let  xk  be  the  current  minimum  point  found  by  minimizing 
along  the  line  defined  by  Xk-2  and  Pk-h  i-e>  xk  =  xk-2  +  r*i§k-itk-b 

3.2A.2  PARTAN  Algorithm 

Step  0.  ( Initialization )  Select  xo  e  X.  Estimate  xj  =  xo  +  <?*oPodo  using 

the  Projected  Gradient  Algorithm.  Set  k  =  1,  n  =  1,  the  maximum 
number  of  iterations  N,  and  stopping  criteria  y. 

Step  1.  Calculate  p^  =  x*  +  q*kPk$k  using  the  Projected  Gradient 
Algorithm.  If  d*  <  lyl  orn  >  N  Stop.  Otherwise  find  xfc+]  by  minimizing 
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along  the  entire  feasible  line  defined  by  x^.i  and  p&;  i.e.,  find  r*,-  using  the 
quadratic  fit  procedure  described  in  Step  2  of  the  Projected  Gradient 
Algorithm  such  that  x^+i  =  x*-i  +  r*,< Set  k  =  k  +  l,  n  =  n  +  l,  and 
repeat. 

3.2.4.3  Theoretical  Development  and  Implementation 

One  notable  change  from  the  Projected  Gradient  Algorithm  involves 
the  line  minimization  requirement  in  Step  1.  In  the  Projected  Gradient 
Algorithm  the  quadratic  fit  involves  the  line  defined  along  the  descent  gradient 
forward  from  the  incumbent  solution  x*  to  a  constraint  boundary.  However,  the 
PART  AN  Algorithm  pays  the  additional  computational  cost  of  searching  the 
entire  feasible  line  both  forward  and  behind  p*.  This  difference  stems  from  the 
unique  objectives  and  assumptions  of  the  two  algorithms.  At  each  iteration  the 
Projected  Gradient  Algorithm  focuses  on  the  stepsize  problem  while  assuming 
it  has  the  correct  directional  descent  vector;  hence,  the  need  for  estimating  a 
quadratic  fit  ahead  of  x*.  By  contrast,  the  PARTAN  Algorithm's  quadratic 
assumption  avoids  the  stepsize  problem  through  the  use  of  parallel  tangents; 
therefore,  it  concentrates  instead  on  finding  a  good  global  quadratic  estimate  at 

Vk 

The  PARallel  TANgent  (PARTAN)  approach  extends  steepest  descent 
techniques  to  the  special  case  where  the  non-linear  function  being  estimated  is  a 
positive  definite  quadratic  (Sivazlian  and  Stanfel  1975).  Since  the  projected 
gradient  method  falls  into  the  class  of  steepest  descent  algorithms,  and  given  that 
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this  research  assumes  Z(x)  can  be  well  approximated  with  a  quadratic  fit, 
PARTAN  methods  provide  an  obvious  search  technique  to  try.  Indeed,  as 
Luenberger  shows,  for  a  pure  quadratic  function  PARTAN  is  equivalent  to  the 
conjugate  gradient  method,  which  itself  has  "...proved  to  be  extremely  effective  in 
dealing  with  general  objective  functions  and  ...[is]  considered  among  the  best 
general  purpose  methods  presently  available  (Luenberger  1989,  238)."  Similarly 
Sivazlian  and  Stanfel  (1975)  show,  in  theory  and  for  a  pure  n-dimensional 
quadratic,  how  PARTAN  will  find  the  optimal  solution  in  just  n-1  optimizations. 
They  also  note  that  for  non-quadratic  functions,  PARTAN  can  still  perform  well 
in  the  reduced  region  of  optimality  where  a  quadratic  approximation  would  be 
more  accurate.  Nonetheless,  the  single  greatest  theoretical  advantage  PARTAN 
brings  to  the  problem  (3.1a)  is  its  strong  global  convergence  properties.  Quoting 
Luenberger  again 

Each  step  of  the  process  is  at  least  as  good  as  steepest  descent;  since  going 
from  x*  to  ...  [pd  ...  is  exactly  steepest  descent,  and  the  additional  move  to 
Xjt+i  provides  further  decrease  of  the  objective  function.  Thus  global 
convergence  is  not  tied  to  the  fact  that  the  process  is  restarted  every  n 
steps  (Luenberger  1989, 256-257). 

Although  (3.1a)  is  neither  deterministic  or  purely  quadratic,  the  claims  made  on 
behalf  of  PARTAN  justify  investigating  its  performance  within  such  a  context. 

3.2.4A  Example  PARTAN 

Figure  3.6  shows  the  results  of  the  PARTAN  algorithm  for  the  APL1P 
problem.  Following  Sivazlian  and  Stanfel's  (1975)  argument  that  where  x  has  a 
dimension  of  «(3.ia)  the  global  minimum  should  be  found  in  ny.ia)  -  1  iterations  of 
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the  PARTAN  algorithm.  Figure  3.6  and  its  accompanying  table  show  just  one 
cycle.  As  expected,  the  PARTAN  Algorithm  exhibits  a  stronger  convergence 
property  than  either  the  Geometric  Simplex  or  Projected  Gradient  methods;  it 
achieves  in  two  steps  what  took  the  Geometric  Simplex  Algorithm  seven  and 
the  Projected  Gradient  Algorithm  five.  Furthermore,  the  lines  in  Figure  3.6, 
coupled  with  the  fact  that  Z(x*+i)  «  Z(p*)  and  the  absence  of  a  pure  quadratic  fit, 


Figure  3.6.  Example  PARTAN  Iteration  for  APL1P 
(Lines  Indicate  Parallel  Tangents) 
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Table  3.3 

Example  PARTAN  Iteration  for  APL1P,  Scenarios=1280,  Q  =  6 


k 

X 

Est.  cf 

Act.  Q 

R2 

Z(x)t 

x0 

3600,  3600 

.613 

.613 

.990 

32224.4 

X1 

1394,  1969 

.513 

.513 

.996 

24693.1 

PI 

1615, 1721 

.539 

.539 

.990 

24654.5 

x2 

1660, 1764 

— 

— 

— 

24655.1 

t  -  Z(x*)  =  24642.3,  x*  =  (1800,  1570) 

suggest  a  major  source  of  error  stems  from  the  inaccuracy  of  the  quadratic 
approximation  (a  phenomenon  also  present  in  the  Projected  Gradient 
Algorithm). 

3.3  Linear  Programming  Algorithms 
3.3.1  Introduction 

The  cost  of  calculating  Z(x)  described  in  the  non-linear  programming 
literature  from  which  these  methods  are  adapted  usually  presumes  Z(x)  in  a 
deterministic  context.  Unfortunately,  in  the  present  problem  /z(x,<d,T)  compounds 
the  computational  expense  by  requiring  statistical  estimation  of  Z(x).  Therefore, 
search  efficiency  is  no  longer  solely  a  matter  of  convergence,  but  one  of  response 
estimation  as  well.  Consequently,  this  research  looks  into  two  broad  areas  of 
efficient  estimation  of  Z(x):  Linear  Programming  Algorithms  and  Variance 
Reduction.  Chapter  4  addresses  variance  reduction  techniques,  while  the  present 
section  focuses  on  LP  algorithmic  improvements. 

Obviously,  the  most  straightforward  method  for  solving  z would  simply 
perform  a  standard  primal  or  dual  simplex  algorithm  for  each  new  right-side 
vector  ©;  -  TjXjt,  and  perhaps  retain  the  previous  optimal  basis  in  the  hope  of  only 
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having  to  do  a  couple  of  pivots  (if  any).  This  optimization  method,  called  the 
OSL  option  after  the  set  of  IBM  library  routines  used  to  implement  it,  constitutes 
the  first  choice  investigated  by  this  research.  It  also  forms  the  basis  for  comparing 
other  alternatives  for  finding  zik  that  begin  with  an  assumption  hinted  at  by  the 
idea  of  minimal  pivot  requirements.  These  alternatives  essentially  employ  the 
repetitive  use  of  either  an  Optimal  Basis  Set  (OBS)  or  an  equivalent  collection  of 
Optimal  Dual  Vectors  (ODV). 

Specifically,  the  three  proposed  alternatives  —  the  OBS-Complete,  OBS- 
Reset,  and  ODV  Algorithms  —  make  the  following  crucial  and  fundamental 
assumption:  If  a  relatively  small  set  of  optimal  bases  characterize  a  subset  of  X, 
then  an  algorithm  faster  than  OSL  can  be  developed  by  taking  advantage  of  that 
set  of  optimal  primal  bases  or  its  equivalent  dual  vectors.  Of  course,  the 
efficiency  of  the  OBS  and  ODV  methods  depends  on  the  size  of  the  recourse 
problem  (3.1b),  the  definition  of  the  subset  of  X,  and  the  size  of  X  itself.  Yet  even 
for  large  problems,  there  are  cases  where  the  OBS  algorithm  can  prove 
advantageous.  The  following  sections  develop  these  concepts  beginning  with  the 
following  definitions. 

Definition:  Let  t,*  be  the  0  realization  of  co,  -  T,-x^;  i.e.,  t,-*  =  co,  -  T {\k. 

Furthermore,  let  tev  =  E[(0]  -  E[T]x. 

Definition.  Let  P,*  be  the  optimal  recourse  basis  in  (3.1b)  for  t,*  =  to*  - 

T ixk;  i.e.,  (P ^tik  =  y  V  Furthermore,  let  Vev>k  be  the  optimal  basis  for  tev 

and  xk. 
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Definition.  Let  P  represent  a  subset  of  the  set  of  distinct  positive  cones 
containing  every  possible  realization  of  co  -  Tx. 

Definition.  Define  L  as  the  number  of  optimal  bases  in  set  P. 

Definition.  Let  the  random  sample  tj  e  {©,  -  Tyx  :  x  e  X},  for;  =  1,  ...  , 
J. 

Definition.  Let  the  set  T  consist  of  sample  realizations  t;  which  are  not 
members  of  any  positive  cone  defined  by  the  elements  of  set  P. 

Definition.  For  an  optimal  set  of  bases  P  of  size  L  to  the  recourse 
problem  (3.1b),  where  optimal  basis  P /  e  P,  and  l  =  1, ...  ,  L,  let  7t/  be  the 
optimal  dual  vector  associated  with  P/,  and  let  the  set  J7  of  size  L  consist 
of  the  corresponding  set  of  optimal  dual  vectors  %i,  l  =  1, ... ,  L. 

3.3.2  OBS-Complete  Algorithm 

Step  0.  (Initialization)  Select  x  and  solve  (3.1b)  for  tev.  Add  optimal 
basis  Pev  to  P,  associated  optimal  dual  vector  Kev  to  77,  set  L  =  1  and 
remove  elements  in  T.  Notationally,  Pev  =  Pi- 

Step  1.  Select  sample  size  J  and  obtain  random  samples  tj,j  =  l,...,  J, 
using  for  each  tj  a  randomly  selected  x  e  X.  Set  non-optimal  counter  n  = 
0.  Proceed  with  one  of  two  options: 

Frequency  of  Optimality.  Set  frequency  of  basis  optimality  counter  F /  =  0, 
l  =  1,  ...  ,  L.  For  each  tj,  j  =  1,...,  J,  and  each  optimal  basis  P/  e  II,  Z  = 
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1,...,  L,  find  yji  =  (P/)4^.  If  yjt  >  0  set  F/  =  F/+  1.  If  y jji  0  V  l  =  1,...,  L, 
then  put  tj  in  T and  set  n  =  n  +  1 .  Index  bases  in  P  on  /  such  that  Fj  >  ...  > 
F/  >  ...  >  F^.  Go  to  Step  2. 

Basic  Coverage.  For  each  tj,  j  =  1,...,  J,  find  y ji  =  (P/)4t7-.  Terminate 
operation  for  current  tj  upon  finding  first  y ji  >  0  and  proceed  with  tj+\.  If 
yji  0  V  l  =  1,...,  L,  then  puttj  in  T  and  set  n  =  n  +  1.  Go  to  Step  2. 

Step  2.  Ifn  =  0STOP.  Otherwise,  set  L  =  L+  1.  Select  tn  from  T.  Solve 
(3.1b),  add  optimal  basis  P„  to  P  and  optimal  dual  vector  %n  to  17,  remove 
elements  in  T,  and  repeat  Step  1. 

3.3.3  OBS-Reset  Algorithm 

Step  0.  (Initialization)  Clear  optimal  basis  set  P  associated  with  previous 
first-stage  vector  x*_i.  Solve  for  the  first  sample  ti*  using  OSL,  and  place 
the  associated  optimal  basis  Pj  in  P. 

Step  1.  Select  sample  size  J.  For  each  following  sample  tjk,j  =  2,  ...  ,  J, 
first  check  for  a  feasible  solution  from  set  P;  i.e.,  where  (P/)"  Hjk  >0,1—  1, 
...  ,  L  and  L  <  j.  If  a  feasible  solution  is  found,  then  by  Proposition  3.1  it 
is  optimal.  Otherwise,  solve  for  I jk  using  OSL,  set  L  =  L  +  1,  set  P^  =  P,*, 
and  place  P^  in  P.  Repeat  until  completing  sampling  requirements. 

3.3.4  ODV  Algorithm 

Step  0.  Select  sample  size  J.  For  each  following  sample  tj,  j  =  1,  ...  ,J, 
by  Proposition  3.2  find MAX{jt/(©y  -  T7x);  e  7 1,1=  1, ...  ,L). 
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3.3.5  Theoretical  Development  and  Implementation 

3.3.5.1  OBS-Complete  and  OBS-Reset 

The  idea  of  exploiting  the  existence  of  a  small  number  of  optimal  bases  for 
(3.1b)  is  not  new  —  it  motivates  the  sifting  algorithm  of  Garstka  and  Rutenberg 
(1973);  bunching  (Wets  1983,  1988)  and  its  extension  by  Haugland  and  Wallace 
(1988);  the  polyhedral  cone  decomposition  algorithm  for  transportation  problems 
proposed  by  Wallace  (1986);  and,  is  informally  accepted  as  reasonable  by  current 
researchers  in  the  field  (Morton  1994b).  However,  unlike  the  sifting  and 
bunching  approaches  which  presume  a  lattice  structure  of  discrete  values  for  © 
and  T,  this  dissertation  proposes  explicitly  decomposing  the  region  of  interest  into 
optimal  bases  (also  referred  to  in  the  literature  as  positive  cones,  polyhedral  cones, 
and  decision  regions)  under  more  general  conditions  of  continuous,  non- 
independent  random  variables  in  ©  and  T,  without  requiring  any  special  structural 
properties  other  than  what  is  already  defined  for  (3.1).  The  following  proposition 
from  linear  programming  (see  Murty  1983  or  Bazaraa,  Jarvis,  and  Sherali  1990) 
as  applied  in  the  context  of  the  two-stage  recourse  problem  (3.1)  provides  the 
theoretical  underpinning  for  this  approach  by  showing  that  for  any  variation  of  the 
right-side  vector  that  remains  within  a  positive  cone  defined  by  an  optimal  basis, 
then  that  basis  is  optimal  for  the  perturbed  value. 

Definition:  Let  trk  represent  the  t&  realization  of  ©r  -  Trx*;  similarly  let 

tyk  represent  the  s&  realization  of  ©5  -  TyX*. 
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Proposition  3.1:  Unique  Optimal  Basis  for  Positive  Cone 

If  Prk  is  the  optimal  basis  for  trk  in  (3.1b),  then  for  any  ts k  #trk  and  tsk  e 

Pos(Prk),  Prk  ts  the  optimal  basis  fortsk. 

Proof:  Let  ts*  e  Pos(Pr*).  By  definition  of  positive  cone  there  exists  a  vector  ysk 
such  that  Prkysk  =  tsb  Ysk  ^  0  whose  primal  solution  value  is  dP(P r,yHsk. 

Since  n*rk  =  dp(Prfc)'1  dual  feasibility  remains  unaffected,  this  gives  the 
dual  solution  dp(Pr*)'1<sfc. 

Since  primal  feasible  and  dual  feasible  solutions  are  equal,  by 
fundamental  duality  Prk  is  an  optimal  basis  for  t,*.  ■ 

Obviously,  the  most  efficient  and  useful  result  of  Proposition  3.1  would  be 
a  small  manageable  set  of  optimal  bases  P  =  {P/,  l  =  1,  2,  ...  ,  L),  such  that  for 
any  x*  e  X  and  all  realizations  of  co  and  T,  Pk  e  P.  In  this  case  P  contains  every 
optimal  basis  (hence  solution)  for  the  entire  space  defined  by  Ax=b,  x  >  0,  and  for 
every  realization  of  CO  and  T;  and,  provides  a  computational  improvement  over  an 
OSL-based  algorithm  in  the  following  manner.  Instead  of  performing  the  pivots 
of  a  revised  primal  or  dual  simplex  for  each  new  tsk,  the  OBS  algorithm  checks 
for  a  feasible  solution  by  multiplying  tsk  by  the  inverses  of  the  bases  in  P.  If  a 
feasible  solution  is  found  (i.e.,  (P/)'1^  =  0,  P/  e  P),  then  zsk  =  ^(P/)'1^. 

Furthermore,  not  every  basis  in  P  must  be  checked.  Since  Proposition  3.1  proves 
that  any  feasible  basis  in  P  is  optimal,  once  a  feasible  basis  is  found  the  search 
may  stop.  (This  would  also  suggest  a  rank  ordering  of  the  set  P  by  frequency  of 
optimality.)  Finally,  not  every  individual  element  of  X[sk  must  be  checked;  if  xl[sk 
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(the  0  element  of  x^)  is  found  to  be.  negative,  then  Xm  is  automatically  infeasible 
and  thus  the  algorithm  can  immediately  skip  to  the  next  basis  P/+i. 

This  dissertation  implements  this  approach  using  a  separate  program 
called  the  OBS-Complete  program.  It  precedes  the  second  Monte  Carlo 
simulation  responsible  for  the  experimental  design  (referred  to  as  the  response 
surface  approximation  (RSA)  program)  by  preprocessing  the  recourse  problem 
(3.1b)  to  find  the  complete  optimal  bases  set  P.  The  problem  of  finding  all  of  the 
optimal  bases  for  (3.1b)  is  equivalent  to  discovering,  out  of  all  the  possible  bases 
available  from  W,  the  associated  positive  cones  contained  within  the  requirement 
space  defined  by  ca  -  Tx,  x  e  X.  The  OBS  Complete  program  accomplishes  this 
task  through  an  iterative  process  consisting  of  two  key  steps:  (1)  updating  the  set 
of  optimal  bases  P  and  (2)  uniformly  sampling  the  recourse  requirement  space 
against  the  set  of  optimal  bases  P  to  determine  the  extent  that  the  sampled 
realizations  of  co  -  Tx  fall  within  the  positive  cones  defined  by  current  set  P. 
Thus,  the  OBS-Complete  Algorithm  essentially  adds  to  and  tests  the  set  P  until 
it  possesses  all  optimal  bases  for  the  sampled  realizations  of  co  -  Tx.  The  RSA 
algorithm  then  uses  the  optimal  bases  set  supplied  by  the  OBS-Complete  program 
to  estimate  Z(x)  and  ultimately  derive  the  response  surfaces  of  interest.  (The 
ODV  Algorithm  also  depends  on  the  OBS-Complete  program  to  supply  it  with 
the  dual  vectors  associated  with  the  primal  optimal  bases.) 

Although  in  theory  such  an  optimal  basis  set  can  be  found  for  any 
problem,  as  a  practical  matter  only  relatively  small  recourse  problems  allow  such 
complete  ’coverage'  of  the  feasible  region.  Furthermore,  at  some  point  the  size  of 
the  set  P,  in  combination  with  the  frequency  distribution  of  optimality  among  its 


83 


elements,  will  offset  the  computational  advantages  just  described.  Yet  even  for 
larger  problems  the  OBS  approach  can  still  prove  helpful  if  the  region  of  interest 
is  a  small  subset  of  X  —  even  to  the  point  of  being  distinct  vector  x.  Therefore, 
this  research  applies  this  basic  technique  in  two  ways: 

1.  OBS-Complete.  This  method  defines  the  region  of  interest  as  X;  in  other 
words,  complete  coverage  of  the  recourse  requirements  space  ©  -  Tx, 
x  e  X,  can  be  provided  by  a  small  set  of  optimal  bases  P. 

2.  OBS-Reset.  This  method  defines  the  subset  of  X  to  be  a  single  vector  x*. 
In  effect,  this  requires  the  optimal  basis  set  P*  to  be  cleared  (or  'reset')  for 
each  new  vector  x*+i.  Although  clearly  not  as  efficient  as  the  OBS- 
Complete  method,  for  larger  recourse  problems  with  numerous  stochastic 
variables  in  ©  and  a  bigger  dimensioned  x,  such  an  approach  can  still  offer 
computational  advantages  over  an  OSL-based  estimation  method. 

Unlike  the  OBS-Complete  or  ODV  Algorithms,  OSL-Reset  does  not 
require  a  preprocessed  basis  set.  Therefore,  the  response  surface  analysis 
algorithm  RSA  directly  incorporates  the  OBS-Reset  Algorithm  along  with  the 
OSL  option.  Finally,  because  of  the  additional  computational  and  storage 
expense,  OBS-Reset  does  not  store  an  associated  set  of  dual  vectors  since  they  do 
not  provide  any  additional  information  on  the  estimate  of  Z(x). 
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33.5.2  ODV 


The  third  basic  approach  to  improving  the  efficiency  of  estimating  Z(x) 
from  a  linear  programming  algorithmic  perspective  uses  the  set  of  dual  vectors 
associated  with  the  optimal  bases  set  P.  Using  an  idea  suggested  by  Morton 
(1995)  and  duality  results  from  linear  programming  theory  (see  Murty  1983  or 
Bazarra,  Jarvis,  and  Sherali  1990),  the  following  proposition  establishes  the  ODV 
technique. 

Proposition  3.2:  Dual  Optimality  of  the  Recourse  Problem 

Given  a  finite  set  of  optimal  bases  P  which  contains  at  least  one  optimal 
basis  for  any  realization  of  CO  -  Tx  to  the  primal  recourse  problem  Min  dy  s.t.  Wy 
=  (o  -  Tx,  y  >0,  with  a  corresponding  set  of  optimal  dual  vectors  IJ  for  the  dual 
recourse  problem  Max  Jt((0  -  Tx),  s.t.  JtW  <  d,  n  unrestricted;  then,  for  any 
realization  of  co-Tx  the  optimal  solution  dy*  =  Max{ki((Q-  Tx),  l-  1,2,  ...  ,L;  %i 
eU}. 

Proof:  By  assumption  any  realization  of  G)  -  Tx  has  the  property  of  possessing  an 
optimal  basis  P *  e  P  and  its  corresponding  optimal  dual  vector  Jt*  e  II. 

Since  the  dual  constraint  JtW  <  d  remains  constant,  every  Jt/  €  17  is 
feasible  for  any  realization  of  (0  -  Tx. 

By  strong  duality  dy*  =  Jt*(co  -  Tx)  at  optimality. 

By  weak  duality  dy*  >  Jt/(co  -  Tx),  JtW  <  d.  Since  by  Step  1  Jt*  e  II  and 
by  Step  2  every  Jt/  e  J7  is  feasible,  the  optimal  solution  must  be 
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Max{tc/(cd  -  Tx),  /  =  1,  2, ...  ,  L;  7t/  e  77};  otherwise,  it  would  contradict 
Step  3.  ■ 

The  ODV  Algorithm  directly  implements  Proposition  3.2  by  simply 
multiplying  the  current  recourse  right-side  vector  ( (Oj  -  Tx*)  against  all  dual 
vectors  n i  e  77,  and  selecting  the  highest  resulting  product  as  the  optimal  solution. 
Unlike  the  OBS  method,  where  the  search  through  the  optimal  bases  set  P  stops 
after  finding  a  feasible  solution,  ODV  must  check  every  dual  vector  7t  in  II  in 
order  to  guarantee  an  optimal  solution.  For  this  reason,  a  dual  vector  counterpart 
to  the  OBS-Reset  option  is  not  possible  since  primal  feasibility  cannot  be 
determined  directly  from  II.  In  other  words,  an  ODV  Algorithm  can  only  be 
used  where  a  complete  dual  vector  set  II  (and  associated  optimal  bases  set  P )  for 
the  subset  of  X  has  been  assembled. 

However,  the  ODV  Algorithm  may  still  offer  computational  advantages 
over  either  the  OSL  or  OBS-Complete  versions  for  cases  where  a  complete  dual 
vector  set  77  can  be  constructed.  For  a  recourse  problem  with  m  constraints,  each 
basis  requires  no  more  than  m2  multiplications  and  m  •  (m  -  1)  additions;  thus, 
where  P  contains  L  bases  would  (worst  case)  require  L  •  (2m2  -  m)  arithmetic 
operations.  By  contrast,  each  dual  vector  requires  m  multiplications  and  (m  -  1) 
additions,  and  for  77  containing  L  dual  vectors  the  ODV  Algorithm  requires 
L  ■  (2m  - 1)  operations.  Clearly,  most  cases  the  OBS  algorithm  will  not  realize  the 
upper  bound  on  the  number  of  operations  due  to  compact  storage  for  sparse 
matrices;  the  potential  to  skip  to  the  next  basis  upon  finding  a  negative  element  in 
the  current  solution  vector;  and,  the  ability  to  stop  after  finding  a  feasible  basis 
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and  not  search  the  remaining  elements  in  set  P.  However,  comparing  the  ratio  of 
the  worst  case  bound  for  OBS-Complete  with  the  known  arithmetic  operations 
requirement  of  ODV  shows  that  the  combined  effects  of  the  above-mentioned 
items  would  have  to  reduce  the  computational  demands  of  the  OBS-Complete 
Algorithm  by  a  factor  of  m  on  average  in  order  to  for  it  to  compete  with  the 
ODV  Algorithm.  Therefore,  the  ODV  Algorithm  will  most  likely  outperform 
the  OBS-Complete  Algorithm  except  for  small  sets  P  and  77  with  a  very  skewed 
distribution  of  the  frequency  of  optimality  among  the  bases  in  P. 

3.3.6  OBS-Based  Results  for  APL1P 

The  APL1P  problem's  low  dimensionality  and  small  probability  space 
make  it  a  good  candidate  for  the  OBS-Complete  Algorithm.  The  major 
difficulty  lies  in  defining  the  requirements  space  for  ©  -Tx;  again,  the  range  of  Tx 
posing  the  biggest  hurdle.  By  inspection  it  is  clear  that  x  is  unbounded  from 
above  and  each  element  must  be  greater  than  1.  However,  it  is  equally  clear  that  a 
practical  bound  exists  based  on  the  upper  limits  of  demand  to;  indeed,  the  highest 
possible  demand  from  either  supply  1  or  2  would  be  where  all  elements  o V  are 
1200  and  only  one  supply  node  (1)  satisfies  the  total  demand  of  3,600  (an  event 
whose  probability  is  6.75xl0‘5).  Therefore,  for  this  problem  the  range  of  xi  is 
[1,3600]. 

Another  complicating  factor  concerns  the  stochastic  elements  in  T.  In  the 
case  of  APL1P  they  have  no  effect  on  the  range  of  the  requirement  space  since 
the  highest  possible  multiple  in  T  renders  any  x*  >  3600  meaningless.  (By  counter 
example,  if  the  highest  multiple  in  T  was  .5,  then  x1  e  [1,7200].)  This  means  that 
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4  independent  discrete  states  for  each  ay  and  £n,  and  5  for  £22  results  in  1280 
independent  possible  realizations  of  the  stochastic  parameters,  or  scenarios. 
Obviously  a  random  sampling  of  these  scenarios  would  provide  a  faster 
evaluation  of  Z(x),  and  the  topic  is  taken  up  in  Chapter  4.  However,  for  the 
search  techniques  and  optimization  options  discussed  in  the  present  chapter  all 
1280  scenarios  are  evaluated  (i.e.,  every  Z(x)  represents  the  true  expected  value  of 
the  objective  function  for  (3.1a)). 

Using  the  data  provided  in  Figure  3.1,  the  OBS-Complete  Algorithm 
found  13  bases  (and  associated  dual  vectors)  that  thoroughly  describe  the 
requirements  space  for  APL1P.  Table  3.4  provides  the  search  data  and 
computational  results,  and  Table  3.5  follows  with  estimates  on  the  frequency  of 
optimality  for  each  basis/dual  vector.  For  these  and  subsequent  results,  the 
reported  computational  times  represent  the  operating  system's  estimate  for  the 
CPU  time  required  to  execute  the  algorithm  —  it  does  not  include  system  I/O  time 
or  overhead  (IBM  1992).  All  examples  and  problems  were  written  and  compiled 
in  FORTRAN  90,  and  run  on  an  IBM  RS/6000  Model  320  under  AIX  3.2. 

Tables  3.6  through  3.8  compare  computational  times  of  the  three 
optimization  options  for  each  of  the  three  search  techniques  discussed  in  Sections 
3.2.2  through  3.2.7.  Each  Table  contains  two  independent  search  paths,  with  the 
first  entry  corresponding  to  the  example  search  path  shown  in  Figures  3.3,  3.4, 
and  3.6.  In  all  cases  the  OBS-Complete  and  ODV  options  are  significantly  faster 
than  the  OSL  —  enough  so  that  at  least  an  order  of  magnitude  difference  in  speed 
occurred,  thus  justifying  the  upfront  cost  in  finding  the  optimal  basis/dual  vector 
set. 
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Table  3.4 

OBS-Complete  Results  for  APL1P 


Sample  Size 
(CD  -  Tx) 

Random  #  Seed 

All  Bases  /  First 
Optimal* 

#  Opt.  Bases/ 
Dual  Vectors 

CPU  Time  (secs) 

1000 

873946 

All 

10 

3.81 

5000 

4209175 

All 

11 

5.80 

25000 

3366149 

All 

12 

29.34 

200000 

66231850 

First 

13 

139.55 

*  -  'First  Optimal'  Option  Skips  Any  Remaining  Bases  After  Finding  First  Feasible,  Whereas  'All 
Bases'  Checks  Every  Basis  in  P  for  each  Sample  (ft)  -  Tx) 


Table  3.5 

Frequency  of  Basis  Optimality 
(Based  on  4th  run  from  Table  3.4) 


Basis  ID  # 

i 

2  3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

Freq.  of  Optimality 

.19 

.18  .12 

.11 

.10 

.10 

.08 

.06 

.04 

.04 

.00 

.00 

.00 

Cumulative  Freq.* 

.19 

.36  .48 

.58 

.68 

.78 

.86 

.92 

.96 

1.0 

1.0 

1.0 

1.0 

Lowest  x  Sampled 

(0.85,  0.00) 

Highest  x  Sampled 

I  (3596.36,3561.52) 

*  -  May  Not  Add  Due  to  Roundoff  Error 


Table  3.6 

Computation  Times  of  OSL/OBS/ODV  Options  of  Geometric  Simplex 
Algorithm  for  Twenty  Simplex  Iterations  of  APL1P  (in  seconds) 


Starting  Simplex  Parameters 

OSL 

OBS 

ODV 

XI  =  (1529,1625),  x2  =  (2437,2681),  x3  =  (2502,3462) 

Random  Seed  =  440908571,  20  Total  Vertices 

274.07 

19.77 

16.62 

xi  =  (1529,1625),  x2  =  (725,718),  x3  =  (3203,3561) 

Random  Seed  =  707446,  20  Total  Vertices 

171.64 

13.53 

10.39 

Table  3.7 

Computation  Times  of  OSL/OBS/ODV  Options  of  Projected  Gradient 
Algorithm  for  APL1P  (in  seconds) _ 


Search  Parameters 

OSL 

OBS 

ODV 

xi  =  (3600, 3600),  2  =  6,  Iterations  =  10 

612.22 

25.09 

23.05 

xi  =  (2,  2),  Q-  8,  Iterations  =  10 

807.39 

32.31 

30.29 

89 


Table  3.8 

Computation  Times  of  OSL/OBS/ODV  Options  of  PART  AN  Algorithm  for 
_ APL1P  (IN  SECONDS) _ 


Search  Parameters 

OSL 

OBS 

ODV 

XI  =  (3600,  3600),  (2  =  6,  Iterations  =  4 

340.62 

17.48 

13.09 

xj  =  (2, 2),  Q  =  6,  Iterations  =  4 

346.78 

17.71 

16.28 

90 


Chapter  4 


Methodology:  Statistical  Analysis 

4.1  Introduction 

Restating  (3.1),  this  dissertation  restricts  its  focus  to  the  class  of  two-stage 
stochastic  linear  programming  problems  with  recourse  of  the  form 

Min Z(x)  =  cx  +  E[/i(x,m,T)],  s.t. Ax  =  b,  x>0  (3.1a) 

h(x,(a,T)  =  Min dy,  s.t. Wy  =  co - Tx,  y>0  (3.1b) 

where  a  finite  mean  and  variance  exist  for  each  component  of  CO  and  T.  The 
previous  chapter's  focus  on  search  techniques  and  optimization  methods  falls 
within  the  traditional  framework  of  solving  stochastic  recourse  problems  of  the 
form  (3.1)  by  synthesizing  previously  disparate  ideas  into  a  unified  and  unique 
methodology  for  solving  (3.1a).  However,  these  computational  advances  merely 
improve  upon  an  existing  analytical  paradigm  without  offering  additional  insight 
into  the  problem.  When  viewed  from  a  statistical  perspective,  though,  (3.1a) 
presents  a  completely  different  challenge  to  the  analyst  and  decision-maker.  This 
challenge  is  not  simply  the  complications  in  finding  the  optimal  solution  x*  posed 
by  the  random  variables  present  in  co  and  T  (although  this  is  certainly  an  issue  and 
a  subject  of  much  research).  Instead,  the  nature  of  the  problem  itself  is  altered  in 
the  following  fundamental  way:  Any  realistic  and  useful  answer  to  (3.1a) 
becomes  less  one  of  strict  optimality  and  more  an  issue  of  (1)  sensitivity  and  (2) 
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variability.  This  chapter  explores  this  new  analytical  approach  starting  with  a 
review  of  the  following  primary  definitions  from  Chapter  3. 

Definition.  LetX  =  {x  :  Ax  =  b,  x  >  0}. 

Definition  Let  x*  represent  the  optimal  solution  that  minimizes  Z(x);  i.e., 
Z(x*)  =  Min  Z(x).  Define  the  region  of  optimality  as  the  set  {x': 
Z(x')<Z(x*)  +  £,  £  >  0,  x'  e  X};  i.e.  those  feasible  solutions  x’  whose 
objective  values  Z(x')  are  near-optimal  (as  defined  by  £). 

Definition.  Let  zik  =  cxk  +  h(xk,(Oi,Ti);  i.e.,  zik  represents  the  objective 
value  given  x*  for  the  ith  realization  of  co  and  T.  Further  define  the 
independent  random  variable  z*  distributed  as  cx*  +  /i(Xjt,co,T),  where 
E[zjt]  =  Z(xk). 

The  first  type  of  inquiry  involves  an  area  of  analysis  called  response 
surface  methodology  that  deals  with  the  shape  of  the  response  function  Z(x),  and 
is  motivated  by  the  following  observations.  First,  the  traditional  optimal  answer 
minimizes  the  first  moment  of  the  recourse  function,  even  though  x*  may  have 
undesirable  characteristics  that  near  optimal  solutions  x'  may  not  share.  If  the 
region  of  optimality  is  'flat',  however,  then  the  difference  between  Z(x*)  and  Z(x') 
may  be  small  enough  to  justify  eliminating  the  unwanted  attributes  of  x*.  Using 
APL1P  as  an  example,  there  does  indeed  exist  an  optimal  answer  (x*  = 
(1801.9,1571.4),  Z(x*)  =  24642.3);  yet,  a  quick  glance  at  the  figures  and  data  in 
Chapter  3  reveals  a  region  of  extremely  low  sensitivity  of  Z(x)  to  changes  in  x. 
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For  instance,  in  Table  3.2  where  X2  =  (1394,1969),  Z(X2)  =  24693.1,  a  mere  .2% 
increase  in  the  expected  optimal  objective  function  value  occurs  for  a  -22.6%  and 
25.3%  change  in  x1  and  x2,  respectively.  Similarly,  there  exists  a  (infinitely)  large 
number  of  near-optimal  solutions  x'  that  roughly  follow  a  line  of  -1  slope  running 
through  the  center  of  the  ellipsoid.  Therefore,  unless  an  analyst  could  claim  that 
(3.1)  truly  captures  all  relevant  objectives  and  constraints,  or  that  no  subjective 
criteria  influences  the  decision-maker,  then  some  measure  of  solution  sensitivity 
becomes  necessary. 

Extending  this  idea,  the  second  —  and  equally  important  —  aspect  of 
sensitivity  analysis  concerns  knowing  where  not  to  move.  For  instance, 
proceeding  an  equal  geometric  distance  in  a  direction  orthogonal  to  the  vector 
from  x*  to  X2  in  APL1P  gives  x0  =  (2199.5,  1979.3)  and  Z(x0)  =  25244.3.  This 
produces  a  2.44%  increase  in  objective  function  value  (over  1 1  times  as  sensitive 
as  compared  to  X2)  for  a  22.1%  and  26.0%  change  in  x1  and  x2,  respectively. 
Generalizing  such  an  approach,  this  chapter  will  show  how  special  canonical 
transformations  of  the  original  response  surface  furnish  such  minimal  and 
maximal  ridge  analysis  for  n-dimensional  problems,  thus  providing  a  very 
important  and  basic  tool  for  characterizing  Z(x). 

Finally,  several  aspects  of  the  distribution  of  /z(x,co,T)  present  another 
major  reason  why  settling  for  the  solution  x*  can  be  deceptive.  First,  although 
Z(x*)  by  definition  provides  the  minimum  expected  value,  decision-makers  are 
often  risk-averse  —  in  short,  instead  of  minimizing  expected  value  they  may  wish 
to  avoid  the  worst-case  scenario.  (For  example,  an  optimal  x*  may  be  less 
desirable  than  a  near-optimal  solution  x'  if  Var[/i(x',co,T)]  «  Var[/i(x*,co,T)].) 


93 


Second,  as  the  literature  clearly  suggests,  the  distribution  of  /i(x,co,T)  itself  is  a 
function  of  x,  and  may  not  necessarily  follow  a  symmetric  distribution 
(preliminary  research  found  several  empirical  examples  of  highly  skewed, 
asymmetric  distributions).  However,  any  unstated  assumptions  on  the  part  of  the 
decision-maker  regarding  the  distributional  form  of  /i(x,oo,T)  can  be  misleading 
when  used  in  conjunction  with  just  E[/i(x,(D,T)].  Therefore,  non-parametric 
analyses,  such  as  tolerance  limits  and  quantile-based  statistics  (such  as  the 
median),  can  provide  very  practical  information  to  supplement  expected  values 
when  comparing  x'  to  x*,  or  for  simply  understanding  the  underlying  behavior  of 
/i(x*,co,T). 

The  third  aspect  of  h(x, ©,T)  —  variance  —  also  influences  both  the  search 
techniques  discussed  in  the  previous  chapter  and  the  validity  of  the  response 
surface  approximation  of  Z(x).  This  occurs  because  the  tremendously  large 
number  of  scenarios  associated  with  many  recourse  problems  requires  a  sampling 
of  the  probability  space  of  G)  -  Tx,  whose  corresponding  estimates  of  the  expected 
values  reduces  the  accuracy  of  the  experimental  design  and  response  surface 
analysis.  Under  these  conditions,  variance  reduction  becomes  especially 
important;  it  not  only  increases  computational  efficiency  for  the  search 
techniques,  but  also  reduces  the  adverse  impact  of  sample  variance  —  which  in 
turn  improves  the  accuracy  of  the  polynomial  approximation  of  the  response  Z(x). 
Although  some  work  has  been  done  on  variance  reduction  in  the  context  of 
improved  estimators,  the  literature  does  not  offer  prior  research  on  these  two 
important  aspects  of  the  problem  from  the  perspective  of  the  decision-maker. 
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This  dissertation  proceeds  from  the  premise  that  these  two  characteristics 
—  sensitivity  and  variance  —  provide  the  most  important  insights  into  the 
behavior  of  (3.1a).  This  chapter  presents  the  techniques  used  to  accomplish  these 
two  goals  under  the  following  topics:  Variance  Reduction,  Experimental  Design, 
Response  Surface  Analysis,  and  Distribution  Analysis. 

4.2  Variance  Reduction 
4.2.1  Introduction 

When  the  probability  models  representing  the  stochastic  elements  of  0)  and 
T  contain  a  large  number  of  discrete  realizations,  or  take  a  continuous  form,  then 
an  unbiased  estimate  Z^x)  of  the  true  population  mean  Z(x)  through  sampling  of 
the  population  becomes  necessary.  Recalling  the  definition  of  z the  first 
unbiased  estimator  of  Z(xk)  this  dissertation  uses  employs  random  sampling  of 
size  I  where 

ZRS(x*)  =  (/)-1-izljt.  (4.1) 

1=1 

It  follows  directly  that  since  Zrs(x^)  itself  is  a  random  variable,  it  has  a  variance 
Var[Zrs(x*)]  defined  as 

Var[Zrs(x*)]  =  Var[(/)-i  •  I  zik]  =  (/)- 2  •  [  I  VARfe*)]  =  ^  (4.2) 

i= 1  i= 1  1 

where  an  unbiased  estimate  S*(7)  of  a2k  can  be  found  using  the  relation 

S 2k{I)  =  (7-l)-i  •  {  [zik  -  ZRS(Xit)l2  (4.3) 

*  i=l 
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(Law  and  Kelton  1991).  However,  the  problem  with  using  the  random  sample 
estimate  ZrS(x*)  for  Z(xk)  lies  precisely  with  (4.2)  —  such  variability  implies 
error.  The  ramifications  of  estimator  variance  on  the  search  process  have  already 
been  addressed  (e.g.,  Barton  and  Ivey  (1991)  and  the  false  convergence  of  the 
Geometric  Simplex  Algorithm);  however,  estimator  variability  can  also 
profoundly  affect  the  accuracy  and  validity  of  the  response  surface  estimates.  For 
example,  if  a  single  point  estimator ZRS(x^)  lying  in  the  ’flat'  region  of  near¬ 
optimality  as  part  of  an  experimental  design  underestimates  the  true  response 
Z(Xfc),  then  the  resulting  polynomial  approximation  could  easily  assume  the 
surface  curves  downward  in  the  direction  represented  by  the  errant  design  point. 
This  in  turn  would  produce  a  response  surface  shape  resembling  a  saddle  where 
the  true  shape  is  known  to  be  convex  (see  Box  and  Draper  1987). 

As  the  name  implies,  variance  reduction  techniques  (VRTs)  attack  this 
problem  by  trying  to  reduce  the  value  of  (4.2)  in  an  efficient  manner.  Obviously, 
the  sample  variance  can  be  decreased  by  increasing  the  sample  size  /,  but  this 
becomes  computationally  prohibitive  for  large  recourse  problems.  The  preferred 
approach  reduces  sample  variance  with  the  same  7,  or  equivalently  gives  (4.2)  the 
same  magnitude  with  fewer  simulations.  There  are  a  wide  variety  of  VRTs;  Law 
and  Kelton  (1991)  provide  an  excellent  description  of  the  major  VRT  categories, 
and  suggest  secondary  references  Nelson  (1987)  and  Wilson  (1984)  for  more 
comprehensive  reviews.  However,  as  noted  in  Chapter  2  the  application  of  VRTs 
to  the  recourse  problem  in  the  literature  remains  limited,  requiring  much 
additional  work  on  both  increased  individual  point  estimator  efficiency  and  its 
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larger  effects  on  response  surface  approximation.  This  research  contributes  to  this 
area  by  exploring  two  areas  of  VRTs:  Control  Variates  and  Latin  Hypercube. 

4.2.2  Control  Variates 

In  very  general  terms,  most  VRTs  attempt  to  reduce  the  sample  variance 
by  correlating  some  internal  aspect  of  the  simulation  to  the  response  being 
estimated.  For  instance,  Common  Random  Numbers  (CRNs)  compare  alternative 
configurations  under  the  same  random  number  stream;  Antithetic  Variables  (A Vs) 
use  complementary  random  numbers  under  the  assumption  that  the  opposing  pairs 
are  negatively  correlated;  and,  Conditioning  Estimation  (CE)  employs  known 
analytical  values  in  lieu  of  estimates  where  possible  (Law  and  Kelton  1991).  For 
reducing  the  sample  variance  in  the  recourse  context.  Control  Variates  (CVs)  is  an 
attractive  technique  for  the  reasons  outlined  in  Law  and  Kelton  (1991): 

1.  Correlation.  Contrary  to  CRN  and  AV  methods,  CVs  work  with  either 
positive  or  negative  correlation. 

2.  Simplicity.  CVs  do  not  require  separate  runs  like  CRNs,  or  synchronized 
replication  like  AVs. 

3.  Effectiveness.  If  any  correlation  exists  between  a  control  variate  and  z^, 
then  CVs  will  reduce  the  sample  variance. 

4.  Efficiency.  Using  internal  CVs  does  not  appreciably  increase  the 
computational  requirements  of  the  simulation. 
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Finally  —  and  most  importantly  —  this  dissertation  contends  that  the  recourse 
problem  presents  an  evident  set  of  random  variables  that  strongly  support  CVs  as 
an  effective  and  efficient  VRT  candidate. 

Following  Law  and  Kelton's  (1991)  presentation,  CVs  assume  that  a 
random  variable  X  with  a  known  expectation  E[X]  is  correlated  (positively  or 
negatively)  with  the  simulation  response  Y,  where  E[Y]  is  unknown  and  thus 
estimated.  Intuitively,  if  during  a  simulated  observation  X  is  greater  than  E[X], 
then  the  resulting  response  Y  should  also  be  greater  (positive  correlation)  or  lesser 
(negative  correlation)  than  E[Y]  and  adjusted  accordingly.  This  relationship  can 
be  expressed  mathematically  as 

Yc  =  Y  -  b(X  -  E[X])  (4.4) 

where  Yc  represents  the  controlled,  unbiased  estimator  for  E[Y];  and,  b  is  a 
constant  whose  sign  corresponds  to  the  correlation  between  Y  and  X,  and  whose 
value  quantifies  the  adjustment.  Taking  the  variance  of  (4.4)  gives 

Var[Yc]  =  Var[Y]  +  b2V\R[X]  -  2i>Cov[Y;X]  (4.5) 


from  which  it  immediately  follows  that  Var[Yc]  <  Var[Y]  if  and  only  if 

2&Cov[Y;X]  >  Z?2Var[X]  .  (4.6) 


Regarding  Var[Yc]  as  a  function  of  b  and  setting  its  derivative  to  zero  gives 


Cov[Y;X] 
°  ~  Var[X] 


(4.7) 
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whose  substitution  for  b  in  (4.5)  yields 

Var[Yc*]  =  (1  -  P2yx)  *  Var[Y]  (4.8) 

or  the  minimum  variance  adjusted  estimator  Ycv*>  where  pyx  is  the  correlation 
between  Y  and  X.  From  (4.8)  it  follows  that  if  Y  and  X  are  at  all  correlated  then 
Var[YCv*1  <  Var[Y];  and,  the  higher  the  correlation  the  greater  the  variance 
reduction  (Law  and  Kelton  1991;  also  see  Kleijnen  1974,  Lavenberg  and  Welch 
1981,  Nelson  1987,  and  Nelson  1990). 

The  relationship  described  in  (4.4)  is  present  in  the  recourse  problem  (3.1) 
by  observing  that  the  estimator  zik  should  be  correlated  with  one  or  more  of  the 
random  components  of  co  and  T. 

Definition.  Define  the  P-dimensional  vector  v  whose  elements  vP, 
p  =  1, ... ,  P,  represent  selected  components  of  (D  or  T  with  known  E[vP] 
and  Var[vP]. 


Definition.  Let  jiP  =  E[vP]  and  define  the  P-dimensional  vector  fi  = 
[p,1,  (X2, ...  ,|XP]. 


* 

Definition.  Let  byP  = 


Cov[zfe;vP] 

Var[vp] 


The  previous  definitions  allow  for  an  unbiased  minimum  adjusted  estimator  for 
the  0  realization  of  vP 


zik  =  zik-b*vp(yf-\iP), 


(4.9) 


where  the  unbiased  CY  estimator  of  Z(xjt)  is 
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(4.10) 


ZCV(x,)  =  (/)-1-Izi; 

i= 1 

and  where  Var[Zcv(xjO]  —  Var[Zr$(x/:)]  (Law  and  Kelton  1991).  While  clearly  a 
correlation  almost  certainly  exists  between  zik  and  the  random  variables  in  co  and 
T,  which  stochastic  element  —  or  combination  of  elements  —  in  ©  and  T 
provides  the  greatest  variance  reduction  as  a  control  variate  is  not  at  all  apparent. 
Answering  this  question  raises  the  following  issues  of  CV  selection,  bias,  and 
non-stationarity. 

4.2.2. 1  Control  Variate  Selection 

Following  Lavenberg  and  Welch  (1981),  all  previous  scalar  CV 
formulations  can  be  extended  to  include  multiple  CVs;  i.e.,  replacing  vp  with  the 
vector  of  controls  v  gives 

z;=z ft-*>,-n)  (4-id 

where 

b*y  =  <5y  z-Pv]-1,  (4.12) 

Ev  is  the  covariance  matrix  for  v,  ovz  is  the  covariance  vector  for  v  and  zik,  the 
elements  of  the  P -dimensional  vector  v,-  are  the  i&  realization  of  the  random 
variable  v?,  p  =  1,  ...  ,  P,  and  the  elements  of  the  vector  characterize  the 
optimum  coefficients  for  maximum  variance  reduction.  However,  in  so  doing  the 
loss  factor 
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1-2 

/-P-2 


(4.13) 


occurs  due  the  estimation  requirements  of  (4.12)  discussed  shortly  (Lavenberg 
and  Welch  1981;  also  see  Law  and  Kelton  1991,  and  Rubenstein  and  Marcus 
1985). 

While  (4.13)  encourages  a  parsimonious  selection  of  multivariate  CVs,  it 
still  does  not  decide  which  random  variables  in  ©  or  Tx  to  select.  This  research 
addresses  these  topics  in  the  following  maimer.  For  vP  select  a  component  of  ©  or 
T  assumed  to  be  most  closely  correlated  to  z based  on  empirical  observations 
from  search  results,  preliminary  screening  designs,  or  subjective  knowledge  of  the 
underlying  system.  Reduce  the  loss  factor  described  in  (4.13)  and  minimize  the 
computational  requirements  implicit  in  estimating  (4.12)  by  letting  P  <3.  This 
heuristic  follows  the  findings  of  Rubenstein  and  Marcus  (1985)  suggesting  that 
too  large  a  P  can  over-correct  the  controlled  estimator. 

4.2.2.2  Bias  Estimators 

While  Var[vp]  is  known,  Covar[z^;vp]  is  not  (the  same  holds  true  for  Xy 
and  cvz,  respectively,  in  the  multiple  CV  case);  therefore,  the  covariance 
relationship  must  be  estimated.  Unfortunately,  such  estimation  creates  a  biased 
estimator  of  Z(x)  as  Law  and  Kelton  (1991)  show  in  the  scalar  case.  Substituting 
the  estimator  byp  for  b*yp  in  (4.9),  where 

b?P  =  CovARfev^]  (4. 14) 

gives  the  estimator  for  CovAR[zfc,vP]  as 
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(4.15) 


/ 

Covar[z*;vp]  =  (I  - 1 )-!  •  I(ziJk  -  Z (x*))(v?  -  p?) 

i=l 

by  letting  pP  (the  sample  estimate  for  pP)  be  defined  as 

pP  =(/)-1-Ivf.  (4.16) 

i=i 

In  turn,  this  gives  the  new  observation  as 

Z *ik  =  Zik  -  Pyptf  - 1^).  (4. 17) 

Consequently  zik,  i  =  1,  ...  ,  I,  are  no  longer  independent  since  they  all  contain 
b*yp.  Therefore,  expectations  cannot  be  taken  across  (4.17)  in  the  same  manner  as 
for  (4.9)  (Law  and  Kelton  1991). 

Like  its  scalar  counterpart  by  must  also  be  estimated,  and  therefore  its 
estimator  will  also  be  biased  for  the  same  reasons.  Following  Lavenberg, 
Moeller,  and  Welch  (1982)  a  biased  estimate  associated  with  by  follows  from 
using  the  multiple  CV  version  of  (4.14), 

C^vz-Pv]'1,  (4-18) 

where  %,  is  the  sample  covariance  matrix  whose  qp^  component  (rows  indexed 
on  q  =  1, ... ,  P,  and  columns  indexed  onp  =  1, ... ,  P)  is 

If  =  (I- 1)-1  •  i  (vf  -  v?)(vf  -  fb)  (4.19) 

i=l 

and  v?  represents  the  sample  estimate  for  v? 
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Similarly,  the  sample  covariance  vector  £rvz  uses  the  relation 

<z  =  (/- 1)'1  •  I(zljt-Z^))(vf-^)  (4.21) 

to  estimate  avz  (Lavenberg,  Moeller,  and  Welch  1982).  One  alternative  to 
eliminating  biased  estimators  due  to  dependent  estimates  of  bwP  (4.14)  or  bv  (4.18) 
with  zk  would  be  to  estimate  £*/>  or  b*y  using  separate  data;  however,  that  would 
increase  the  number  of  samples  for  x*.  Instead,  a  more  efficient  approach  uses 
jackknifing  (or  generalized  splitting )  (Kleijnen  1974;  Lavenberg,  Moeller,  and 
Welch  1982;  and,  Miller  1974). 

This  research  implements  a  simple  jackknife  estimate  using  the  following 
procedure  as  described  by  Lavenberg,  Moeller,  and  Welch  (1982). 

I 

Definition.  Let  Zjx/f  =  (If1  •  X  zik- 

i=i 

/ 

Definition.  Let  v  =  (7)1  •  X  vi- 

i=\ 

11  P  P 

Definition.  Let  the  set  7*  =  {z^,  ...  ,z^;  vu,  ...  ,v/fc;  ...  ;vu,  ...  ,vJk} 
represent  all  observations  of  z and  v,*,  i  =  1, ... ,/ for  x*. 

Temporarily  dropping  the  k  subscript  for  convenience,  the  biased  estimator  for 
Z(x)  becomes 


Z(x)  =  Z(x)-6>-pv). 


(4.22) 


Partitioning  the  7  observations  of  set  J  into  G  sets  of  equal  length  77  such  that 
/  =  GH,  let  index  g  denote  the  set  of  observations 


Zgl 

- 

zg2 

-  ^2 

g  =  1, ... ,  G 

(4.23) 

zgH 

... 

where,  if  {z,-,v,}  c  J,  {z/,v/}  c  J,  and  i  *  l,  then{z,,v/}  n  {z/,v/}  =  0.  LetZg(x) 
represent  an  estimator  of  Z(x)  using  the  same  formulation  of  (4.22)  without 
{zg,vg};  i.e.,Zg(x)  estimates  Z(x)  using  all  observations  from  J  except  the  subset 
{Zg,Vg}.  Similarly,  represent  the  estimator  by  calculated  from  all  the  data 
in  /  except  {Zg,vg}  using  (4.18).  Then  G  distinct  estimates  of  Z(x)  can  be  found 
using  the  relationship 

$g  =  GZ(x)  -  (G-l)Zg(x),  (4.24) 

where  <f>g  represents  a  weighted  adjusted  estimate  based  on  how  much  Zg(x) 
deviates  from  Z(x) 

$g  =  G  •  [ Z(x)  -  Zg(x)]  +  Zg(x),  (4.25) 

which  in  turn  gives  the  reduced-bias  jackknifed  estimator  of  Z(x) 

Zc(x)  =  (G)rl.  |$g  (4.26) 

8= 1 

(Lavenberg,  Moeller,  and  Welch  1982;  also  see  Nelson  1990). 
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Using  the  jackknifed  estimator  raises  several  implementation  issues.  First, 
one  must  decide  how  to  partition  I  to  balance  the  computation  requirements  of 
(4.26)  against  the  need  for  a  lower-bias  estimator.  Kleijnen  (1974)  reports  that 
Tocher  (1963)  shows  that  for  G  =  2  the  variance  of  (4.26)  is  twice  that  for  Zcv(x) 
using  a  non-stochastic  b*v  in  (4.22),  and  therefore  suggests  G  >  2  as  a  technique 
for  reducing  it.  Although  both  Kleijnen  (1974)  and  Miller  (1974)  suggest  using  G 
=  I  and  H  =  1  as  the  best  jackknife  grouping,  such  a  configuration  would  be 
computationally  expensive.  As  a  compromise,  this  research  will  assign  H  =10 
and  I  >  50.  Second,  since  Ev  is  known  from  the  model's  distributional 
assumptions,  one  could  skip  estimating  particularly  since  the  assumption  of 
independence  of  the  elements  in  to  and  T  renders  the  off-diagonal  elements  in  Ey 
zero.  However,  following  Kleijnen  (1974)  and  Nelson  (1990),  all  elements  of  the 
sample  covariance  matrix  %,  will  be  estimated  from  the  sample  data  in  Z  using 
(4.19-20),  even  though  Ey  is  known.  (This  implies  that  the  off-diagonal  elements 
ofEy*0  due  to  sampling  error.)  Finally,  Lavenberg,  Moeller,  and  Welch  (1982) 
also  observe  that  (4.26)  does  not  require  any  distributional  assumptions;  hence, 
the  jackknife  procedure  will  work  for  any  continuous  or  discrete  distributions  in  to 
and  T. 

4,2.2. 3  Non-Stationarity 

Adding  the  k  subscript  back,  a  stationary  b*y  is  unlikely  over  all  values  of  x 
because  of  the  changes  x*  imposes  on  (3.1b)  through  Tx.  Therefore,  by  should  be 
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re-estimated  for  each  x*  independently  from  other  points  during  a  search 
technique  or  experimental  design. 

4.2.3  Latin  Hypercube 
4.2.3.1  General  Description 

Employing  Latin  Hypercube  (LH)  sampling  as  a  method  for  variance 
reduction  was  suggested  by  Wilson  (1995)  based  on  his  recent  research  in 
stochastic  activity  networks.  LH  uses  an  extended  stratified  sampling  structure 
proven  by  McKay,  Conover,  and  Beckman  (1979)  to  guarantee  that  for  any 
estimator  of  the  mean  response  Y  associated  with  the  random  variable  Y,  where 
the  equivalent  LH  estimator  is  YLH,  Y  =  /(x),  and  x1'  (a  component  of  x)  represents 
a  random  variable,  Var[Ylh1  ^  Var[Y]  if  Y  is  a  monotonic  function  for  each  x‘ 
in  x.  Since  the  objective  value  changes  as  a  monotonic  function  of  any  right-side 

A 

value  of  any  inequality  constraint,  it  immediately  follows  that  VAR[Zuj(xk)]  < 
Var[Zrs(xic)]  far  the  class  of  problems  represented  by  (3.1). 

Further  motivation  for  investigating  LH  variance  reduction  in  the  recourse 
context  comes  from  McKay,  Conover,  and  Beckman: 

One  advantage  of  the  Latin  hypercube  sample  appears  when  the  output 
Y(t)  is  dominated  by  only  a  few  of  the  components  of  X.  This  method 
insures  that  each  of  those  components  is  represented  in  a  fully  stratified 
manner,  no  matter  which  components  might  turn  out  to  be  important 
(McKay,  Conover,  and  Beckman  1979, 240). 

This  dominance  characteristic  concurs  with  the  CV  assumption  of  high  correlation 
between  one  or  more  elements  of  (©  -  Tx)  and  zik.  Second,  subsequent  empirical 
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results  by  Avramidis  (1992),  Avramidis  and  Wilson  (1995),  McKay,  Conover, 
and  Beckman  (1979),  and  Stein  (1987)  found  LHs  provide  considerable  variance 
reduction  in  a  variety  of  simulation  environments.  Third,  McKay,  Conover,  and 
Beckman  (1979)  show  that  E[ZLH(x*)]  =  Z(x*);  i.e.,  the  LH  estimator  is  unbiased 
and  thus  does  not  need  any  compensating  adjustments  (e.g.,  the  jackknifing 
procedures  of  CVs).  Finally,  unlike  CVs,  LHs  have  the  advantage  of  not 
requiring  a  priori  knowledge  of  a  random  variable  with  known  correlation  to  the 
response.  Thus,  LHs  would  be  especially  useful  in  the  initial  search  for  the  region 
of  optimality,  as  well  as  during  the  experimental  design  phase. 

The  LH  approach  falls  under  the  Correlation  Induction  category  of  VRTs. 
Like  antithetic  variates  (AVs),  LHs  attempt  to  induce  a  negative  correlation 
between  two  or  more  responses  such  that  the  variance  of  the  averaged  responses  is 
less  than  that  of  a  randomly  sampled  estimator.  Beginning  with  the  next  set  of 
definitions  (and  dropping  the  k  subscript  again  for  convenience),  the 
accompanying  explanation  of  the  LH  method  for  the  recourse  problem  follows  the 
description  of  the  general  case  provided  by  Avramidis  and  Wilson  (1995). 

Definition.  In  general,  let  lJID  represent  the  joint  distribution  of  T 
independent  variables. 

Definition.  In  general,  let  the  T-variate  distribution  L>£7  possess  the 
following  properties: 

T 

1.  Each  univariate  marginal  distribution  of  Da  follows  a  uniform 
(0,1)  interval. 
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2. 


Every  bivariate  marginal  distributions  is  negative  quadratic 
dependent  {NQD)\  i.e.,  for  any  bivariate  random  vector  (. Bj,B2 )  in 
DTci then  Prob{57  <  blt  B2  <  b2)  <  Prob{57  <  Z?7}  Prob{52  £  h). 

3.  All  bivariate  marginal  distributions  of  DCI  equal  each  other. 

Definition.  Let  R  represent  the  number  of  random  variables  in  T  and  to. 

Definition.  Let  Utr  represent  a  random  variate  drawn  on  the  unit  interval 
(0,1)  for  the  random  column  vector  U'r,  where  r  =  1, ... ,  R,  and  t=  1, ...  , 
T.  This  implies  generating  the  actual  random  variables  in  the  recourse 
problem  by  using  unit  interval  variates  Utr  as  inputs  (e.g.,  inverse 
transformations  for  continuous  variables).  The  elements  U‘r  also  represent 
intervals  (or  stratum)  of  equal  probability  with  respect  to  the  original 
distribution  of  the  random  variable  represented  by  the  column  vector  f7*r. 

Definition.  Define  the  i&  sample  matrix  l Jt  to  consist  of  the  column 
vectors  [t/*1,  l/*2,  ....  U'tR]  or  equivalently  the  row  vectors  [t/-\  uf,  ...  , 

uj'i 

Definition.  Let  z n  represent  the  value  for  cx  +  h(x,(£>it,Tit)  for  the  sample 
row  vector  of  random  variates  lfi  from  17,-. 

Definition.  Define  the  function  /(3.i)(t/?*)  to  represent  the  value  of  cx  + 
h(x,a>it, Tit)  in  (3.1)  using  the  random  variables  co  -  Tx  derived  from  the 
input  variates  U‘'\  i.e.,  zit  =/( 3.i)(t^*)- 
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The  general  form  of  a  correlation  induction  sample  l/,-  is 


(4.27) 


where  the  column  vectors  lZ*r  fall  into  two  mutually  exclusive  sets  —  one  group 
following  distribution  DTCI  while  the  other  conforms  to  lJ[D  (and  where  the  second 
set  of  column  vectors  U'.r  with  distribution  Dtid  can  be  empty).  This  structure 
guarantees  that  the  estimators  zit,  t  =  1,  ...  ,  T  derived  from  the  row  vectors  £/•* 
will  be  negatively  correlated  since  (4.27)  provides  dependent  rows  U1-  (hence 
dependent  zit)  under  the  NQD  property  of  D^r  At  the  same  time,  it  also  insures 
the  column  vectors  U.'r  independence  through  the  random  marginal  univariate 
sampling  of  the  unit  interval  (0,1)  (Avramidis  and  Wilson  1995). 

An  examination  of  antithetic  variates  (AVs)  by  Avramidis  and  Wilson 
(1995)  provides  an  easy  example  of  how  (4.27)  works.  AVs  form  a  special  case 
of  (4.27)  where  T  =  2;  Ulr,  r  =  1, are  randomly  and  independently  sampled 
from  a  uniform  distribution  (0,1);  and,  U2r  =  (1  -  Ulr),  r  =  1, ... ,  R.  This  gives 


u? 

u}2 

u)R 

1  -u" 

1  -u}2 

.  l-t/;^ 

(4.28) 


where  zn  =f(3.i)(U1'),  z&  =f(3.i)(U2'),  and  dependence  exists  between  l/1*  and  V2' 
through  the  relationship  U2r  =  (1  -  Ulr)-  Furthermore,  the  random  sampling  of 
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Ulr,  in  conjunction  with  U2r  =  (1  -  Ulr),  meets  the  criteria  of  the  distribution  D2CI. 
Hence,  the  column  vectors  U'r  are  both  independent  and  provide  a  negative 
correlation  between  z*i  and  zq  (Avramidis  and  Wilson  1995).  For  sample  I/,-,  the 
antithetic  estimate  of  Z(x)  for  I  sample  size  is 

Zav(x)  =  CD'1  •  X  zi>  (4-29) 

i=i 


where 


Z;1  +  zi2 
"  2  ’ 


(4.30) 


and  has  variance 

VAR[Zav(x)]  =  VAR[Z,l]  +  VAR[Z^+  2COVAR[zti;zt;2]  (4.31) 

If  z{1  and  zi2  were  independent  Covar[z;i;z,2]  =  0  andZAv(x)  =  ZRS(x);  however, 
since  monotonicity  guarantees  that  z,i  and  z j2  are  negatively  correlated, 
Covar[z,i;z/2]  <  0  and  Var[ZAv(x)]  <  Var[Zrs(x)]  (Law  and  Kelton  1991). 

Returning  to  Avramidis  and  Wilson  (1995),  LH  sampling  implements  the 
structure  of  (4.27)  and  its  supporting  assumptions  using  the  relation 


Utr  = 


ltr(t)  -  1  +  lf[D 

T 


under  the  following  definitions. 


(4.32) 
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Definition.  Where  the  ordering  of  the  set  of  integers  {1,  ...  ,  T}  has  T\ 
permutations,  let  the  function  Tt/*),  r  =  1,  ...  ,  R,  represent  a  random 
sample  (with  replacement)  of  that  set  of  permutations.  Furthermore, 
define  nr(t)  as  the  0  element  in  the  0  random  permutation. 

Definition.  Let  lf[D  be  an  independent  random  sample  of  the  unit  interval 
(0,1)  for  t=  1, ...  ,T,  andr=  1, ...  ,R. 

Note  that  the  elements  of  the  column  vectors  U'r  represent  a  uniformly  stratified 
sample  of  size  T  randomly  ordered  by  the  permutation  k/*).  Within  each  stratum 
an  independent  sampling  of  the  unit  interval  (0,1)  is  taken,  while  the  permutation 
function  Kr( •)  insures  that  every  stratum  is  represented  once  in  each  column 
vector.  Consequently,  (4.32)  describes  a  distribution  since  each  univariate 
marginal  distribution  follows  a  uniform  (0,1)  interval;  and,  every  bivariate 
marginal  distribution  is  both  NQD  and  equivalent  to  any  other  bivariate  marginal 
distribution.  Furthermore,  since  both  U*D  and  Kr{*)  are  independent,  the  column 
vectors  U'r  are  independent  as  well.  Therefore,  by  previous  definitions  (4.32) 
provides  a  negative  correlation  for  row  vectors  by  providing  one  or  more 
independent  column  vectors  U'r  with  distribution  D^j  (Avramidis  and  Wilson 
1995). 

4.2.3.2  Discrete  Random  Variables 

Since  the  problems  examined  by  this  research  model  the  stochastic 
elements  of  (0  and  T  with  discrete  distributions,  the  LH  assumption  of  continuous 
variates  in  (4.32)  must  be  adjusted.  McKay  (1988)  suggests  such  a  modification 
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by  allocating  the  discrete  values  of  the  random  variable  vP  proportionally  to  their 
probabilities  within  its  associated  column  vector  U’r. 

Definition.  Let  the  random  variable  vP  take  on  D  discrete  realizations  wd, 
d  =  1, ...  ,  D.  Furthermore,  let  denote  its  associated  column  vector  in 
Ui  for  the  0-  sample  vf,  and  its  dimension  T  represent  the  stratification  size 

Off/;. 

D 

Definition.  Let  vd  =  Prob{vp  =  w</},  where  X  Pd  =  1- 

l 


Since  the  exact  allocation  of  any  wj  (p d-T)  will  most  likely  be  non-integer,  it  can 
be  partitioned  into  its  integer  and  fractional  components 

pj-T  =  lNT(p d-T)  +  fd  (4.33) 

such  that  each  wd  of  \P  will  have  at  least  Int(p d-T)  elements  in  U‘p  for  each 
sample  i  of  size  T.  It  follows  directly  that 

T=  X  lNT(pd-7)  +  |  frf  (4.34) 

d= 1  d= 1 

and  since  both  T  and  iNT(p^-r))  are  integer 

FP  =  X  U  (4-35) 

d=  l 

is  also  an  integer  representing  the  remaining  slots  in  U’p  that  can  be  filled  by 
additional  wd  beyond  their  guaranteed  quota  of  Int^-T)  (McKay  1988). 
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McKay's  method  introduces  the  column  vector  independence  of  U’p  first 
by  randomly  sampling  on  the  unit  interval  (0,1)  F p  times  and  allocating  the 
additional  wd  based  on  the  proportion 

&  (4-36) 


where 


D  h 

Xg=l- 

d=itP 


(4.37) 


The  resulting  selection  of  size  T  using  lm(pd-T)  and  the  random  sampling  just 
discussed  is  then  independently  ordered  within  the  column  vector  U.  p  using  the 
permutation  function  7Cp(T).  This  process  repeats  for  each  LH  sample  i,i=  1, , 
7,  by  which  w^  will  appear  on  average  in  U’p  according  to  the  frequency 
described  by  (4.33)  (McKay  1988).  Also  note  that  U.rt  of  17,-  in  the  discrete 
context  represents  the  actual  random  variable,  instead  of  a  random  variate  drawn 
on  the  unit  interval  requiring  (usually)  an  inverse  transformation. 

This  section  completes  its  description  of  the  LH  technique  with  an 
example  17,-  for  APL1P.  Setting  the  sample  size  T  =  10,  Table  4.1  gives  the 
lm(pd’T),  f d,  f/Fp,  and  associated  assigned  portions  of  the  unit  interval  for  the 
three  different  discrete  distributions  followed  by  £  and  CO.  Figure  4.1  gives  a 
sample  17,  based  on  the  parameters  in  Table  4.1;  a  random  selection  of  two 
additional  realizations  for  G)i,j  =  1,  2,  3,  which  are  (1100,1000),  (1100,1000),  and 
(1100,1200),  respectively;  and,  the  random  permutation  function 
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Table  4.1 

LH/McKay  Distribution  of  APL1P  Stochastic  Parameters  for  T=  10 


VP 

W  d 

Prob{vp  =  W  d) 

Im(pdT) 

u 

U 

vp 

Proportion  of 
(0,1)  Itvl. 

1.0 

.2 

2 

.0 

.0 

— 

& 

.9 

.3 

3 

.0 

.0 

— 

.5 

.4 

4 

.0 

.0 

— 

.1 

.1 

1 

.0 

.0 

— 

1.0 

.1 

1 

.0 

.0 

— 

.9 

.2 

2 

.0 

.0 

— 

¥ 

.7 

.5 

5 

.0 

.0 

— 

.1 

.1 

1 

.0 

.0 

— 

.0 

.1 

1 

.0 

.0 

— 

900 

.15 

1 

.5 

.25 

.0  -  .25 

(oi 

1000 

.45 

4 

.5 

.25 

.25+  -  .5 
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.25 

2 

.5 

.25 

.5+  -  .75 
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.15 

1 

.5 

.25 

.75+-  1.0 

U’S1 

C/*£2 

U'al 

U'°>2 

C/*to? 

I/O 

.5 

.7 

1000 
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900 

C/1* 

1.0 

.1 

1000 

1100 

1200 

U 2* 

.5 

.7 
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1000 

1100 

U2' 

.9 

1.0 

1100 

1000 

1000 

C/4* 

1.0 

.9 

1000 

1000 

1000 

C/5- 

.5 

.7 

1000 

1100 

1000 

U o 

.1 

.9 

1100 

1200 

1200 

If1' 

.5 

.0 

1100 

1100 

1100 

C/8* 

.9 

.7 

1200 

1000 

1100 

C/9* 

.9 

.7 

1000 

1000 

1000 

Figure  4.1.  Sample  C/,-  LH  Matrix  (T  =  10) 
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4.2.4  Example  Variance  Reduction  Results 

Table  4.2  and  Figure  4.2  provide  variance  reduction  results  for  APL1P 
using  CV,  LH,  and  random  sampling  (RS)  methods.  Since  the  search  techniques 

A  § 

and  response  surface  procedure  use  a  single  estimate  Z{xk)  for  each  x*,  this 
example  compares  the  accuracy  of  the  sample  estimators  and  magnitude  of  the 
sample  variance  under  the  following  definitions. 

Definition.  Where  Zj(x*)  represents  an  unbiased  estimator  for  Z(x*)  as 
previously  defined  for  sample  size  7  using  sampling  technique  s  e 
{RS,CV,LH},  let  Z5(x^)„  be  the  n&  unbiased  estimate  independently 
sampled  from  any  other  estimate  of  Z(x*). 

Since  Zs (xk)n  itself  is  a  random  variable  let 

Z5(xjfc)w  =  (TV)-1  •  IZ s(xdn  (4.38) 

n= 1 

represent  the  mean  for  N  independent  sample  estimates  of  Z(xk)  using  sampling 
technique  s,  while 

s 2,(x*)*  =  (iV-l)-i  •  I  (Zs(xk)n  -  %{xk)N)2  (4.39) 

n= 1 

defines  the  sample  variance  of  the  estimator  Zs(xk)tf.  These  definitions  parallel 
the  previous  formulations  (4.1)  and  (4.3)  for  a  single  estimator  Zg(xk)  of  Z(xk), 
except  that  here  the  variance  of  interest  is  the  one  associated  with  the  estimator 
itself.  Justification  for  such  a  comparison  follows  directly  from  the  fact  that  as  an 
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unbiased  estimate  of  Z(xfc),  any  Z^x*)  drawn  from  the  sampling  technique  with 
the  smallest  s25(x^)  will  —  on  average  —  be  closer  to  Z(x*)  thanZj(x^),  t*s  (see 
Law  and  Kelton  1991). 

For  comparison  purposes  in  APL1P  the  random  sample  and  control  variate 
estimators  2*s(xi k)n  and  Zcv(xfc)n  use  a  sample  size  /  =  50.  The  CVs  use  £2, 
and  co1  as  the  control  variates  with  a  jackknife  partition  G  =  5  and  H  =  10. 
Lacking  a  preliminary  screening  design,  these  controls  are  selected  based  on  the 
direct  influence  of  41  and  £2  on  the  amount  of  resource  x1  and  x2,  while  adding  G)1 
due  to  its  association  with  the  highest  recourse  cost  coefficients.  (Adjusting 
previous  notation,  let  Zcv(xi fc)n  =  ZG(xfc)n!  i-e->  the  CV  designation  assumes  the 
jackknifed  adjustment.)  The  Latin  Hypercube  estimator  ZLH(x^)„  uses  a 
stratification  size  T  =  50  as  well,  equivalently  giving  it  the  same  sample  size  as 
RS  and  CV.  For  vectors  xk,k=  1,...,  5,  ten  independent  estimators  Zs(xk)n  (s  e 
{RS,CV,LH},  N  =  10)  provide  the  basis  for  calculating  Zy(xj On  and  s2s(xk)N- 

The  results  in  Table  4.2  show  a  dramatic  reduction  in  the  variance  of  the 
estimator  for  both  CVs  and  LHs,  although  in  general  LHs  gave  both  the  lowest 
variance  and  the  most  accurate  estimate  of  Z(x*)  (with  the  exception  of  CVs  for  x 
=  (2700,2700)).  Similarly,  Figure  4.2  graphically  demonstrates  the  LH  (and  to  a 
lesser  extent  the  CV)  estimates  varying  far  less  than  their  RS  counterparts  about  a 
more  accurate  assessment  of  Z(x)  by  plotting  the  individual  Zy(x^)„  used  to 
calculate  Table  4.2  against  Z(xk)  and  their  respective  Zs(xk)N.  Indeed,  these  plots 
can  be  thought  of  as  sample  distributions  of  the  various  estimation  techniques  s  e 
{RS,CV,LH},  for  N  =  10,  and  Zs(xk)  as  a  single  sample.  From  inspecting  Figure 
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Table  4.2 

Comparison  of  Estimator  Accuracy  and  Variance  for 


X* 

Z(x) 

Zrs 

_9  * 

SZRS 

Zcv 

-2  * 
szCV 

%+ 

ZLH 

-2  * 
szLH 

%t 

(1800,1800) 

24689 

25065 

623 

24875 

44 

.93 

24677 

15 

.98 

(900,900) 

26425 

26989 

229 

26795 

49 

.79 

26450 

6 

.98 

(900,2700) 

25131 

25646 

856 

25424 

38 

.96 

25123 

6 

.99 

(2700,900) 

25299 

25806 

826 

25395 

51 

.94 

25309 

12 

.99 

(2700,2700) 

27499 

27576 

342 

27552 

35 

.90 

27618 

61 

.82 

*  -  in  thousands  t  -  %  variance  reduction  from  RS 


Figure  4.2.  Scatterplot  of  Zs(Xk)n  [•]  andZ^x*)#  H  for 
Population,  RS,  CV,  and  LH  Sampling  Techniques 
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4.2  and  knowing  the  true  mean  Z(xk),  LH  sampling  clearly  stands  out  as  the  most 
accurate  method. 

Figure  4.2  also  gives  the  most  powerful  evidence  of  how  the  variability  of 
ZRS(x*)„  can  badly  mislead  either  the  search  or  the  response  surface 
approximation.  For  instance,  examining  a  near  optimal  solution  x  =  (1800,1800) 
finds  one  extreme  of  ZRs(x*.)  =  26186  while  the  other  extreme  is  23449  (the  tme 
population  mean,  Z(x  =  (1800,1800))  =  24689,  is  designated  POP).  Most 
importantly,  both  results  are  far  more  likely  to  occur  as  Z^sfak)  samples  than 
either  ZCv(*k)  or  Zuj(Xk)  for  estimates  ofZ(x).  Furthermore,  the  consequences  of 
such  error  becomes  obvious  given  Z(x*)  =  24642  and  the  'flatness'  of  the  region  of 
optimality.  If  either  estimate  had  been  used  in  a  search  technique,  it  would  have 
misdirected  the  process  in  a  region  sensitive  to  sampling  errors.  Similarly,  in  the 
context  of  a  response  surface  design,  had  the  lower  estimate  constituted  a  non¬ 
central  design  point  the  regression  would  assume  the  objective  function  value  for 
(1800,1800)  was  lower  than  the  presumed  optimal  centerpoint  and  fitted  the 
appropriate  —  but  incorrect  —  non-convex  saddle  surface. 

4.3  Experimental  Design 
4.3.1  Introduction 

Once  the  search  process  has  found  one  or  more  optimal  (or  near-optimal) 
solutions,  the  next  step  requires  deriving  a  polynomial  approximation  of  the 
response  Z(x)  to  changes  in  x.  Since  no  two  problems  will  exhibit  the  same 
response  characteristics,  this  dissertation  cannot  give  precise  guidance  on  issues 
associated  with  constructing  the  design  (i.e.,  factor  selection,  factor  ranges,  and 
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type  of  design)  used  to  calculate  the  response  approximation;  nor  can  it  turn  to  the 
stochastic  programming  literature  for  help  since  this  idea  has  not  been  tried  before 
in  the  context  of  linear  programming  with  recourse.  Therefore,  this  section  will 
present  a  general  approach  for  constructing  designs  as  applied  to  (3.1a)  based  on 
the  experimental  design  and  simulation  literature,  and  leave  the  details  to  the 
specifics  of  the  problem. 

Instead  of  constructing  a  design  to  derive  the  response  surface,  one 
possibility  would  be  to  simply  estimate  a  regression  model  based  on  the  data 
collected  by  the  search  process.  However,  such  an  unstructured  procedure  would 
not  provide  as  good  an  approximation  as  a  more  formal  experimental  design  for 
several  reasons.  Quoting  Law  and  Kelton 


In  simulation,  experimental  design  provides  a  way  of  deciding  before  the 
runs  are  made  which  particular  configurations  to  simulate  so  that  the 
desired  information  can  be  obtained  with  the  least  amount  of  simulating. 
Carefully  designed  experiments  are  much  more  efficient  than  a  "hit-or- 
miss"  sequence  of  runs  in  which  we  simply  try  a  number  of  alternative 
configurations  unsystematically  to  see  what  happens  (Law  and  Kelton 
1991,  657). 


A  better  approach  would  treat  the  search  data  as  a  type  of  reconnaissance 
information  that  gives  us  much-needed  insight  into  the  basic  behavior  of  the 
recourse  problem  and  provides  guidance  on  how  to  structure  the  formal 
experimental  design.  Specifically,  this  preliminary  information  directs  the 
centering  of  the  design  based  upon  the  location  of  the  optimal  solution;  and, 
through  its  rudimentary  knowledge  of  the  relationship  between  x  and  Z(x), 
restricts  the  design  (and  associated  response  surface)  to  the  region  of  optimality. 
By  using  this  two-step  method  (preliminary  screening  and  experimental  design) 
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one  can  easily  assess  the  accuracy  of  the  resulting  polynomial  approximation 
using  the  characteristics  of  the  specific  type  of  design  in  conjunction  with  the 
known  convexity  of  Z(x). 

Although  the  topic  of  experimental  design  encompasses  an  immense 
literature  (Steinberg  and  Hunter  1984),  this  research  focuses  on  its  application 
within  the  realm  of  simulation.  Law  and  Kelton  (1991)  cite  three  specific 
advantages  simulation  experiments  have  over  their  physical  or  industrial 
counterparts: 

1.  Control.  Unlike  a  physical  environment,  simulation  experiments  control 
the  level  or  value  of  the  input  factors.  In  the  recourse  context,  this  means 
the  first-stage  decision  vector  x. 

2.  Variability.  Controlling  the  source  of  variability  —  the  random  number 
generators  —  allows  for  the  application  of  the  VRTs  explained  in  the 
previous  section. 

3.  Randomization.  Unlike  physical  environments,  where  systematic  error 
requires  repeated  experiments  under  the  same  input  values  (termed 
replications),  a  single  estimate  of  sufficient  sample  size  for  each  set  of 
input  factors  will  suffice  (Law  and  Kelton  1991). 

These  advantages  in  turn  lead  to  designs  whose  estimates  of  the  polynomial 
parameters  lire  both  relatively  simple  to  calculate  and  highly  efficient  (Box  and 
Draper  1987). 
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This  section  will  introduce  the  experimental  designs  used  for  this  research 
by  first  reviewing  its  terminology  and  assumptions,  then  presenting  the  type  of 
designs  used  to  derive  the  response  surfaces,  and  concluding  with  a  simple 
example  for  APL1P.  Topics  dealing  specifically  with  response  surface  estimation 
are  deferred  until  Section  4.4. 

4.3.2  Terminology  and  Assumptions 

This  dissertation  uses  the  following  terminology  from  the  experimental 
design  literature  as  applied  in  the  context  of  the  recourse  problem. 

Definition.  Let  k  index  a  partition  of  size  K  of  the  decision  vector  x  e  X 
such  that  x  =  [x1,  ...  ,  x*  ...  ,  x*  I  x*+1,  ...  ,  xn<31a>Y  =  [xK  I  x/-]T  (note 
that  k  no  longer  refers  to  a  vector  xk  in  the  context  of  the  search  sequence). 

Definition.  Let  the  term  factor  be  synonymous  with  xk,k  =  1,  ...  ,  K  in 
xK.  The  subset  of  factors  xk  constitutes  an  assumption  of  which  variables 
in  x  significantly  influence  Z(x).  Since  the  non-factors  x^+1, ... ,  xn(3-la)  in 
xK-  are  held  constant,  the  experimental  design  consists  solely  of  changes 
in  the  factor  values  as  inputs,  with  Zf(x)  as  the  output  (or  response). 

Definition.  The  literature  refers  to  levels  as  the  values  of  the  factors, 
which  in  (3.1a)  corresponds  to  the  value  for  xk  >  0.  N -level  designs  refer 
to  the  number  of  N  levels  each  factor  is  restricted  to  having  in  the 
experimental  design.  This  research  uses  only  two-level  designs 
(disregarding  center  point  and  axial  values),  and  in  keeping  with 
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conventional  notation  designates  the  lowest  value  for  xk  as  and  the 
highest  as 

Definition.  Center  point  refers  to  the  intermediate  or  average  value  of  x* 
and  is  represented  as  'O';  by  extension,  the  centerpoint  of  the  experimental 
design  for  xk  is  'O'. 

Definition.  Design  point  refers  to  a  specific  combination  of  factor  levels 
for  xk- 

Definition.  Axial  point  denotes  a  design  point  whose  factor  levels  are  all 
center  points  'O'  except  for  xk  =  ±ak,  where  ak  is  greater  than  the  value  for 
xk  represented  by '+'  (and  by  symmetry  -ak  < 

Definition.  Response  refers  to  the  output  of  the  simulation  —  in  this  case 
the  estimator  Zs(x)  for  Z(x).  The  response  surface  approximation  V(x) 
refers  to  a  second-order  polynomial  approximation 

K  K  K 

V(xK)  =%  +  lh*k+  I  I  W  (4.40) 

*=1  i>jj=  1 

for  a  specified  range  of  x%,  where  Z(x)  =  V(x^)  +  e(x^),  and  &(x£)  is  error 
due  to  random  variability  and  lack  of  fit. 

Definition.  Replication  refers  to  a  single  simulation  and  its  associated 
response  at  a  given  design  point.  In  the  recourse  context,  the  estimator 
Zs(x)  for  the  sampling  technique  5  e  {RS,CV,LH}  represents  a  single 
replication  for  the  design  point  defined  by  the  factor  levels  of  x^. 
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Definition.  Letting  rk  represent  the  absolute  value  of  the  difference 
between  the  high  value  '+'  and  low  value  for  xk,  and  ck  the  value  of  the 
center  point  for  xk,  the  coded  value  x*  for  xk  is 


k  xk  -  ck 

Xc  ~  rk 


(4.41) 


By  this  definition,  the  coded  values  for  the  two-level  design  points  are  +1 
and  -1,  the  center  points  0,  and  the  axial  points  ak  >  1,  -a*  <  -1. 


With  these  basic  definitions,  the  literature  consists  of  a  wide  variety  of 
experimental  design  structures  (Central  Composite,  Box-Behnken,  Mixture, 
Plackett-Burman,  to  name  a  few)  that  provide  their  associated  response  surface 
approximations  with  different  degrees  of  higher  order  interactions,  confounding, 
coefficient  variance,  response  variance,  and  orthogonality.  This  dissertation 
restricts  its  investigation  to  a  widely  used  design  —  the  central  composite  (CC) 
design  —  as  the  principal  method  (with  variations)  for  deriving  the  polynomial 
approximations  of  Z(x)  (Box,  Hunter,  and  Hunter  1978,  Diamond  1989,  Kleijnen 
1987,  Law  and  Kelton  1991,  Lorenzen  and  Anderson  1993,  or  Montgomery 
1984). 

4.3.3  Central  Composite  Design 

Following  Montgomery  (1984),  the  CC  design  represents  a  very  popular 
design  for  fitting  second-order  models,  and  contains  three  basic  components  —  a 
'core'  2K  factorial  or  fractional  design,  2 K  axial  points  and  ng  centerpoints. 
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1.  Factorial  Design.  The  2K  factorial  design  consists  of  all  combinations  of 
factor  levels  for  a  two-level  design;  graphically,  it  would  represent  the  four 
comer  points  of  a  square  for  2  variables,  the  eight  comer  points  of  a  cube 
for  3  variables,  etc.  A  2K'P  fractional  design  consists  of  a  subset  of  the 
full  factorial  design  (minus  2P  design  points),  and  is  referred  to  as  a  U2P 
fractional  design  (discussed  in  more  detail  below). 

2.  Axial  Points.  Continuing  with  the  geometric  analogy,  the  axial  points 
represent  design  points  projected  through  the  center  of  each  side  of  the 
hypercube.  Mathematically  stated,  the  sets  of  axial  points  would  be  {±al, 
0, ...  ,0},  {0,  ±a2, 0, ...  ,0}, ... ,  {0, 0, ... ,  ±a^},  and  for  a  CC  design  of 
K  factors  constitutes  2 K  design  points. 

3.  Center  Points.  The  center  points  represent  n k  replications  of  the  design 
point  {0, 0, ... ,  0}. 

This  configuration  provides  the  three  minimum  levels  for  each  factor  necessary  to 
estimate  a  quadratic  model,  and  depending  on  the  selection  of  the  values  for  ak 
and  hk  gives  the  response  approximation  the  following  characteristics. 

1.  Rotatable.  The  variance  of  the  predicted  response  V(x^)  varies  only  with 
distance  —  and  not  the  direction  —  from  the  centerpoint.  The  value  of  a 
determines  this  attribute  of  the  design. 
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2.  Orthogonal.  This  condition  occurs  whenever  the  off-diagonal  elements  of 
X^X  are  zero,  and  is  a  function  of  the  value  of  ng.  In  the  CC  design 
context  it  minimizes  the  variance  of  the  regression  coefficients  a iy. 

3.  Uniform  Precision.  Under  this  property,  controlled  by  the  size  of  nK,  the 
variance  of  the  response  V(x^)  is  consistent  throughout  the  hypercube 
defined  by  the  factorial  design  points.  Note  that  the  value  of  nK  for  a 
uniform  precision  design  can  differ  from  that  for  an  orthogonal  one. 

Another  often  cited  characteristic  of  CC  designs  observes  that  they  can  be  built  up 
from  simpler  first-order  designs  that  use  just  the  factorial  portion  (Montgomery 
1984).  However,  given  the  known  convexity  of  Z(x)  this  research  assumes  that  a 
positive  definite  quadratic  approximation  will  always  be  required  in  the  region  of 
optimality,  and  therefore  will  proceed  directly  with  some  version  of  a  CC  design. 

Resolution,  another  important  concept  associated  with  experimental 
designs,  measures  the  level  of  factor  interaction  the  polynomial  approximation 
can  independently  determine.  Recalling  the  definition  of  the  2K-p  fractional  as  a 
subset  of  the  full  factorial  design  minus  2P  design  points,  the  design  resolution 
denotes  the  level  of  confounding  between  certain  factor  interactions  that  occurs  in 
fractional  designs.  In  the  context  of  problem  (3.1),  for  example,  this  means  that  if 
both  interactions  xSxh  and  xixJxk  significantly  affect  Z(x),  certain  fractional 
designs  could  not  discern  one  effect  from  the  other. 

Definition.  Define  the  resolution  level  R  (denoted  using  Roman 
numerals)  as  one  where  every  r-factor  interaction  can  be  independently 
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determined  from  any  other  factor  interaction  containing  less  than  R  -  r 
factors  (Box,  Hunter,  and  Hunter  (1978). 


Using  the  previous  example,  as  two-level  and  three-level  interactions, 
respectively,  x8xh  and  xixJxk  confounding  implies  no  greater  than  a  resolution  V 
fractional  design  since  the  difference  of  the  design  level  (5)  and  interaction  level 
of  the  first  factor  set  (2)  is  not  less  than  the  level  of  the  second  factor  set  (3).  By 
contrast,  for  a  resolution  V  design  main  factor  effects  are  not  confounded  with 
two-way  interactions  (5-1=4,  which  is  greater  than  2),  nor  would  two-way 
interactions  be  confounded  with  other  two-way  interactions  (5-2  =  3,  which  is 
still  greater  than  2).  Note  that  only  the  full  2K  factorial  design  provides  complete 
factor  interaction  estimates.  (Also  see  Box  and  Hunter  1961.) 

One  problem  associated  with  the  CC  design  stems  from  the  exponential 
growth  in  the  factorial  portion  that  occurs  with  an  increase  in  the  number  of  input 
factors  K.  For  smaller  problems  (such  as  APL1P,  CEP1,  PGP2  and  even  4TERM) 
such  growth  is  manageable;  however,  for  larger  problems  (e.g.  20TERM)  a  full 
factorial  CC  design  can  not  be  accomplished  and  analyzed  in  a  reasonable  period 
of  time.  For  example,  20TERM  requires  263  =  9.223- 1018  runs  just  for  the  full 
factorial  portion  of  the  design.  Consequently,  this  research  employs  two  basic 
tactics  to  reduce  the  CC  design  to  a  manageable  level  for  the  larger  problems  — 
preliminary  screening  and  fractional  factorial  designs. 
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43.3.1  Preliminary  Screening 

As  explained  by  Box  and  Draper  (1987)  the  orthogonal  (or  near- 
orthogonal)  property  available  through  CC  designs  provides  far  more  precise 
estimates  of  that  give  smaller,  but  more  inclusive,  individual  and  joint 
confidence  regions  of  the  regression  coefficients.  For  this  reason  alone  a 
regression  based  on  the  data  collected  from  the  search  process  would  not  provide 
as  accurate  a  polynomial  approximation.  However,  the  optimal  search  process 
can  provide  preliminary  information  to  assist  the  development  of  the  CC  design  in 
varying  degrees,  depending  upon  the  size  of  (3.1). 

First,  for  all  problems  the  search  technique  provides  an  optimal  solution 
that  determines  the  center  point  of  the  design.  Centering  the  design  around  an 
optimal  solution  gives  two  principal  advantages: 

1.  Region  of  Optimality.  The  purpose  of  the  response  surface  analysis  is  to 
find  a  second-order  polynomial  approximation  of  the  optimum 
neighborhood,  which  will  often  be  just  a  small  portion  of  X.  Centering  the 
design  guarantees  that  the  regression  approximates  the  correct  area  of 
interest. 

2.  Convexity.  The  literature  gives  this  analysis  the  advantage  of  knowing  the 
true  surface  of  Z(x)  to  be  convex,  which  implies  that  the  fitted  polynomial 
should  also  be  convex.  Therefore,  centering  the  design  on  a  'side'  of  the 
surface  (instead  of  the  'bottom')  —  in  conjunction  with  Zs(x)  error,  the 
'flatness'  and  shape  of  Z(x),  and  the  design  resolution  —  increases  the  risk 
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of  the  regression  fitting  a  saddle  surface.  (Box  and  Draper  (1987)  give  an 
excellent  discussion  of  this  phenomenon  in  Chapter  11.) 

For  these  reasons,  design  location  is  very  crucial;  therefore,  analysis  of  the  search 
data  complements  the  'traditional',  more  formal  screening  designs  used  in  the 
experimental  literature  (e.g.,  Plackett  and  Burman  1946).  Such  designs  would  not 
be  appropriate  without  some  knowledge  of  the  optimal  solution;  and,  in  any  case 
would  be  difficult  to  construct  given  the  size  of  X. 

Another  reason  for  using  the  optimal  search  data  in  a  preliminary  analysis 
concerns  the  range  of  the  experimental  design  factors  x%.  Since  the  purpose  of  the 
response  analysis  is  to  approximate  the  region  of  optimality,  the  design  does  not 
need  to  stretch  over  the  entire  set  X.  (Indeed,  such  a  fit  would  strain  the 
assumption  of  a  quadratic  approximation  for  Z(x).)  For  smaller  problems, 
determining  the  range  (i.e.,  the  actual  values  for  xk  denoted  by  '-'  and  '+')  presents 
no  challenge.  However,  bigger  problems  will  most  likely  require  a  formal 
screening  design  process. 

Highly  fractionated  designs  of  resolution  III  and  IV  present  one  method 
for  determining  the  composition  and  factor  ranges  of  using  data  from  the 
optimal  search  process.  According  to  Box,  Hunter,  and  Hunter  (1978)  the 
Plackett-Burman  designs  (1946)  offer  a  way  to  determine  the  main  effects  of  K 
factors  in  N  =  K  +  1  runs,  where  N  is  a  multiple  of  4.  Similarly,  for  N  as  a  power 
of  2,  resolution  III  designs  can  also -be  built  for  K  =  N  -1  factors  in  N  runs  by 
saturating  a  2d  factorial  design,  where  d  =  ln(AT)/ln(2).  If  a  resolution  IV  design  is 
needed  (i.e.,  main  effects  not  aliased  with  two-way  interactions),  then  a  foldover 
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of  a  resolution  El  design  (where  the  resolution  El  design  is  replicated  with  the 
and  ’+’  design  points  reversed  and  then  added  to  the  original  design)  gives  such  a 
design  for  2N  runs  (Box,  Hunter,  and  Hunter  1978).  Assuming  Z(x)  contains  no 
complicated  higher-order  interactions,  resolution  IV  designs  should  adequately 
characterize  xK.  Finally,  note  that  these  designs  only  estimate  linear  effects.  They 
are  equivalent  to  the  'core'  factorial  portion  of  a  CC  design,  and  thus  do  not 
include  axial  or  center  points. 

Finally,  group  screening  offers  another  approach  for  removing  non- 
important  factors  before  building  a  CC  design.  Kleijnen  (1987)  describes  this 
method  where  individual  factors  are  consolidated  into  groups  and  subsequently 
treated  as  a  single  factor  in  the  screening  design.  If  any  ensuing  group  effect  is 
inconsequential,  then  all  the  factors  in  such  a  group  are  assumed  insignificant. 

4.3.3.2  Fractional  Factorial  Designs 

Once  the  subset  of  influential  factors  xk  has  been  set,  the  next  step 
requires  estimating  the  second-order  polynomial  equation.  Assuming  that  no 
third-order  or  higher  interactions  occur  (which  is  equivalent  to  the  fundamental 
assumption  this  thesis  makes  regarding  the  adequacy  of  a  quadratic 
approximation),  fractional  CC  designs  provide  another  obvious  method  for 
reducing  the  size  of  the  experiment.  However,  adequately  estimating  a  quadratic 
function  with  the  assurance  of  no  confounding  of  the  two-way  interactions  with 
each  other  requires  at  least  a  resolution  V  design  (Box  and  Draper  1987). 
Whitwell  and  Morbey  (1961)  note  that  resolution  V  designs  require  a  minimum 
[1  +  K  (K  +  l)/2]  factorial  design;  when  combined  with  2 K  axial  points  and  nK 


center  points,  such  designs  should  handle  up  to  K  =  25  first  stage  variables. 
Assuming  that  a  reasonable  reduction  from  the  original  x  occurs  in  the 
preliminary  screening  phase,  resolution  V  designs  will  be  adequate  for  most 
problems. 

For  situations  where  K  >  25,  resolution  HI*  designs  provide  another  way 
of  estimating  a  quadratic  polynomial.  As  described  by  Draper  and  Lin  (1990), 
resolution  in*  designs  essentially  use  the  axial  points  of  a  CC  design  to  de-alias 
the  confounding  effects  between  the  one-way  and  two-way  interactions  of  the 
factorial  portion  (also  see  Hartley  1959).  Their  article  also  shows  how  resolution 
IE*  designs  can  be  derived  from  resolution  V  designs. 

4.3.4  Example  Experimental  Design 

As  a  small  problem,  APL1P  does  not  require  either  a  preliminary 
screening  design  or  a  fractional  factorial  design.  Table  4.3  gives  the  CC  design 
for  APL1P,  while  Figure  4.3  shows  the  design  superimposed  on  the  contour  chart 
for  APL1P.  Table  4.4  gives  the  regression  results  derived  from  the  SAS  analysis 
for  both  coded  and  uncoded  variable  inputs.  Finally,  because  these  results 
represent  the  population  means  (hence  no  variance)  the  design  includes  only  one 
center  point  sample.  (Note  that  this  experimental  design  provides  the  contour 
representation  of  Z(x)  presented  throughout  Chapters  3  and  4  (e.g.,  Figure  4.3). 
Although  technically  incorrect  since  several  of  them  exceed  the  design  region,  the 
contours  are  nonetheless  used  for  illustrative  purposes.) 
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Table  4.3 


Central  Composite  Design  for  APL1P 

Uncoded  xv 

Coded  xjc 

Response 

xi 

x2 

X1 

X2 

Z(x) 

1000 

770 

-1 

-1 

26580.3 

1000 

2370 

-1 

+1 

24862.3 

2600 

770 

+1 

-1 

25286.7 

2600 

2370 

+1 

+1 

26622.1 

669 

1570 

-1.414 

0 

25489.3 

2931 

1570 

+1.414 

0 

26166.5 

1800 

439 

0 

-1.414 

26183.3 

1800 

2701 

0 

+1.414 

25760.9 

1800 

1570 

0 

0 

24642.6 

Table  4.4 

Regression  Results  for  CC  Design  in  Table  4.3 


Analysis  ofVariance 


Source 

DF 

Sum  of  Squares 

Mean  Square 

R  Square 

Model 

5 

4061687 

812337 

.9894 

Error 

3 

43673 

14558 

Total 

8 

4105360 

Parameter  Estimates 

Variable 

Coded  Par.  Est. 

Uncod.  Par.  Est. 

Intercept 

24643.0 

33276.0 

xi 

180.0 

-4.8969 

X2 

-122.5 

-5.4860 

X1  *  X2 

763.3 

0.0012 

(X1)2 

577.1 

0.0009 

(x2)2 

649.2 

0.0010 
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Figure  4.3.  Central  Composite  Experimental  Design  for  APL1P 
(Coded  Variable  Designation) 


4.4  Response  Surface  Analysis 
4.4.1  Introduction 

The  application  of  response  surface  methodology  (RSM)  to  the  two-stage 
stochastic  linear  programming  problem  with  recourse  represents  one  of  this 
dissertation's  major  contributions  to  this  area  of  research.  Specifically,  this  thesis 
adopts  Box  and  Draper's  (1987)  advocacy  of  examining  the  type  and  nature  of 


132 


factor  dependence ,  a  term  they  use  to  describe  a  response  function  characteristic 
where  its  reaction  to  one  factor  is  not  independent  of  the  other  factor  levels.  Such 
factor  interaction  typically  generates  a  ridge  system  of  responses  that  can  take  on 
various  shapes  and  levels  of  stationarity,  symmetry,  and  attenuation.  This  type  of 
response  analysis  provides  the  best  insight  on  the  reaction  of  the  response  to 
changes  in  the  input  variables  for  the  following  reasons: 

1.  Alternative  Optima.  A  range  of  alternative  optimal  solutions  often  can  be 
found  along  a  maxima  or  minima  ridge,  where  changes  in  one  input 
variable  can  be  compensated  by  changes  in  another  with  no  loss  of 
optimality.  The  direction  of  the  ridge  measures  this  exchange  ratio,  and 
therefore  may  find  more  suitable  solutions  in  practical  or  subjective  terms. 

2.  Optimization  of  Second  Response.  Superimposing  a  second  response  on 
the  original  plot  allows  for  selecting  a  point  along  the  ridge  that  optimizes 
the  second  response. 

3.  Direction  of  Insensitivity.  Essentially  an  extension  of  item  (1),  the  ridge 
can  also  give  an  attenuation  direction  that  minimizes  departures  from  the 
optimal  solution. 

4.  Yield  Improvement.  For  rising  ridge  systems,  this  analysis  gives  an 
improving  direction. 
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5.  Underlying  Mechanism.  Factor  dependence  can  supplement  the  analyst's 
or  decision-maker's  knowledge  of  the  underlying  mechanisms  of  the 
problem  (Box  and  Draper  1987). 

Items  (1),  (3),  and  (5)  promise  to  be  the  most  advantageous  in  the  context  of  the 
recourse  problem  (3.1).  Secondary  response  surfaces  in  item  (2)  are  not 
investigated,  and  item  (4)  should  not  occur  if  the  experimental  design  is  properly 
centered.  Additionally,  this  research  will  also  emphasize  the  opposite  aspect  of 
item  (3);  i.e.,  where  not  to  go  by  finding  the  direction  of  maximum  sensitivity. 

Obviously,  for  higher-dimensional  problems  such  analysis  requires  an 
algebraic  description  of  the  ridge.  Box  and  Draper  (1987)  describe  the  technique 
of  canonical  analysis  (CA)  as  one  method  of  providing  such  a  description.  (In  an 
appendix  they  also  review  an  alternative  analytical  method  referred  to  as  ridge 
analysis  by  A.  Hoerl  (1959)  and  R.  Hoerl  (1985),  which  essentially  functions  as  a 
steepest  ascent  technique  for  second-order  surfaces.  This  dissertation  does  not 
employ  this  type  of  analysis,  and  its  use  of  the  term  'ridge  analysis'  refers  to 
identifying  the  minima  and  maxima  ridge  by  way  of  canonical  analysis  of  the 
response  surface  advocated  by  Box  and  Draper  (1987)).  The  outline  of  the  CA 
method  in  the  following  sections  follows  the  one  provided  by  Box  and  Draper 
(1987)  based  on  the  two  models  they  present  —  the  A'  and  B'  canonical  forms. 
Additionally,  this  study  demonstrates  the  application  of  these  methods  to  APL1P. 
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4.4.2  A  Canonical  Form 

Adapting  Box  and  Draper's  (1987)  presentation  to  the  recourse  problem, 
describing  the  coefficients  of  (4.40)  in  the  matrix  forms 


-xl- 

"ai  “ 

_  /V 

ail 

i/2ai2  ... 

i/2ai/f 

XK  = 

X2 

,  a  = 

h 

,  A  = 

1/2&12 

h.2 

1/2^2  K 

-X*- 

-Ik- 

-  l^aiAT 

II2SL2K  ... 

&KK- 

allows  the  fitted  second-order  response  surface  approximation  of  Z(x)  to  be 

V(xjr)  =  +  xKa  +  xkA\k.  (4.43) 

(Recall  that  the  experimental  design  uses  a  subset  of  first-stage  decision  variables 
xK  of  size  K,  where  x  =  [x*  I  x*']T  6  X .)  Letting  Xk  and  m*  represent  the 
eigenvalue  and  eigenvector,  respectively,  for  A  for  k  =  1, ...  ,  K,  then 

Am*  =  m*AA  (4.44) 

Where  each  eigenvector  is  normalized  (i.e.,  (m*)T-m*  =  1),  the  matrix  M  consists 
of  m*,  k  =  1, ... ,  K  as  its  column  vectors.  Letting  A  be  a  diagonal  matrix  whose 
elements  A**  are  A.*,  k  =  1, ... ,  K,  gives 

AM  =  MA.  (4.45) 

Since  M  is  an  orthonormal  matrix  MT  =  M'1  and  MMT  =  I;  consequently, 
multiplying  (4.45)  with  MT  produces 
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MTAM  =  A, 


(4.46) 


and,  using  the  second  identity  with  the  associative  property ,  rewrites  (4.43)  as 

V(xtf)  =%  +  (x^M)(MTa)  +  (xtkM)MtAM(MJxk).  (4.47) 

Defining  X  =  IVFx/f  and  0  =  MTa  restates  (4.47)  as 

VA(X)  =  ao  +  XT0  +  XTAX  (4.48) 


or  equivalently 

VA(X)  =  9o  +  I  B*X*  +  I  lk(Xk)2  (4-49) 

1  1 


where  0  =  [01,  ...  ,  0*]T,  X  =  [X1,  ...  ,  X*]T,  VA(X)  represents  the  response 
approximation  for  the  transformed  vector  X,  and  by  previous  definitions 
Va(Mtx£)  =  V(xk).  Thus,  the  linear  transformation  (4.48)  forms  the  A  canonical 
configuration  where  essentially  an  axis  rotation  eliminates  the  cross-product  terms 
in  the  original  response  approximation.  Furthermore,  as  explained  in  more  detail 
shortly,  the  eigenvalues  V1  indicate  the  type  of  surface  (4.40)  fits,  the  eigenvectors 
M*  denote  the  component  contribution  of  the  original  axes  to  the  rotated  ones,  and 
0*  measures  the  slope  of  the  rotated  axes  from  the  origin  of  the  original  coordinate 
system.  Finally,  setting  the  derivative  of  (4.49)  with  respect  to  to  zero 
produces 
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which  defines  the  stationary  (minimum)  point  with  respect  to  the  rotated  axes 
(Box  and  Draper  1987). 

Using  the  coded  parameter  estimates  from  Table  4.4,  an  A  canonical 
analysis  for  APL1P  produces 


and 


577.082  381.6701 

,  (4.51) 

381.670  649.194 


229.769  0.0001  [  214.089" 

,  0  =  (4.52) 

0.000  996.507 J  L  29. 195 J 

the  linear  transformation 

VA(X)  =  24643  +  214.09X1  +  29.19X2  +  229.8(Xi)2  +  996.5(X2)2,  (4.53) 
and  the  stationary  points  in  the  rotated  coordinates  as 


1  -214.09 

s  -  2-229.8 


-.4658 


Figure  4.4  illustrates  the  A  canonical  analysis  of  APL1P.  Since  the  coded 
variables’  axes  (not  shown)  center  around  the  optimal  solution  xo  =  (1800,1570), 
so  do  the  rotated  axes  X1  and  X2  indicated  by  the  solid  lines.  The  axes  X1  and 
X2  displayed  by  the  dashed  lines  and  centered  about  the  stationary  point  Xs 
concern  the  B  canonical  analysis  discussed  shortly. 
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XI  Capacity  (In  Hundreds) 


Figure  4.4.  A  and  B  Canonical  Analysis  for  APL1P 


Most  importantly,  the  canonical  transformations  of  (4.51-54)  can  provide 
an  algebraic  description  compatible  with  the  graphical  depiction  of  Figure  4.4. 
Again,  adapting  Box  and  Draper's  (1987)  general  description  of  canonical  analysis 
to  the  recourse  problem  shows 


1. 


Eigenvalues.  The  eigenvalues  Xk  measure  the  degree  of  slope  in  the 
transformed  coordinate  system  X,  while  the  signs  indicate  the  type  of 
contour  surface  (i.e.,  concave,  convex,  rising,  saddle,  etc.).  Since  the 
literature  has  shown  Z(x)  is  either  a  convex  (minimization)  or  concave 
( maximization )  function  of  x,  X  must  be  either  strictly  positive 
(minimization)  or  strictly  negative  (maximization).  Thus  the  eigenvalues 
can  not  tell  only  the  relative  sensitivity  of  the  fitted  response  Va(X)  to 
movement  along  the  rotated  axes,  but  can  also  confirm  the  validity  of  the 
fit  through  their  signs  in  the  context  of  the  recourse  problem.  In  the  case 
of  APL1P,  both  eigenvalues  are  significant,  although  curvature  along  the 
rotated  X2  (996.5)  axis  is  four  times  as  steep  as  that  of  X1  (229.8). 

2.  Eigenvectors.  The  normalized  eigenvectors  of  M  describe  the  component 
contribution  of  the  original  xg  axes  to  the  rotated  axes  X  through  the 
relationship  X  =  MTx*-.  In  practical  terms,  this  means  the  elements  of  the 
eigenvectors  provide  the  basis  for  estimating  the  factor  dependence  —  or 
tradeoffs  —  of  the  decision  variables  xK.  For  instance,  in  the  case  of 
APL1P  a  roughly  equal  presence  of  both  decision  variables  x1  and  x2  — 
(.7396, -.6730)  and  (.6730, .7396),  respectively  —  indicate  an  approximate 
45°  rotation  of  the  fitted  quadratic  surface  to  the  original  decision 
variables  x. 

3.  Slope.  The  variable  Qk  measures  the  slope  of  the  fitted  response  in  the 
direction  of  the  rotated  axis  Xk.  For  APL1P  X1  has  a  far  higher  linear 
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coefficient  than  X2.  In  conjunction  with  the  stronger  curvature  of  X2,  this 
indicates  that,  while  Va(X)  is  less  sensitive  to  changes  in  X1  than  in  X2, 
there  does  not  exist  a  stationary  ridge  through  the  entire  design  region. 

4.  Stationarity.  The  distance  D  of  the  design  origin  to  the  estimated 
stationary  (minimum)  point  is 


D  = 


(4.55) 


and  provides  a  measure  of  how  close  the  optimum  point  of  the  fitted 
response  Va(X)  is  to  the  center  of  the  experimental  design  x*  or  x'  (Box 
and  Draper  1987).  In  the  case  of  APL1P  D  =  [(-.4658)2  +  (-.0146)2]1/2  = 
.466,  as  seen  in  Figure  4.4. 

For  closely  located  stationary  points  and  design  centers,  a  second  canonical  form 
—  the  B  Canonical  Analysis  —  can  further  simplify  the  polynomial 
approximation  (4.53)  by  centering  the  rotated  axes  X  around  the  stationary  point 
Xs. 


4.4.3  B  Canonical  Form 

Box  and  Draper  (1987)  suggest  employing  the  B  canonical  analysis 
whenever  the  experimental  design  and  stationary  point  of  the  fitted  surface  are 
approximately  the  same  (i.e.,  the  fitted  surface  closely  approximates  the  true 
response),  and  suggest  D  <  1  as  a  benchmark.  This  transformation  occurs  using 
the  relationship 
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-2AXS  =  0 


(4.56) 


which  gives  the  fitted  response 

Z(XS)  =  ao  +  1/2XS'6. 


(4.57) 


Defining  Xk  as 

X*  =  X*-X*  (4.58) 

and  its  corresponding  vector X  =  [X1, ... ,  X*]T  produces  the  B  canonical  form  of 
(4.48) 

VS(X)  =  Z(XS)  +  X'AX  (4.59) 

and  of  (4.49) 

VB(X)  =  Z(XS)  +  |  Xk(Xk)2  (4.60) 

/t=i 

where  VB(X  -  X5)  =  V,i(X)  (Box  and  Draper  1987).  For  the  example  problem 
APL1P,  the  B  canonical  form  becomes 

VB(X)  =  24543  +  229.8(X!)2  +  996.5(X2)2  (4.61) 

as  indicated  by  the  dashed  lines  in  Figure  4.4. 

4.4.4  Ridge  Analysis 

The  final  aspect  of  RSM  employed  by  this  dissertation  provides  the  best 
insight  into  the  recourse  problem  —  the  direction  of  minimum  and  maximum 
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sensitivity  of  the  fitted  response  V(x*)  to  changes  in  \K.  Again  paraphrasing  Box 
and  Draper  (1987),  the  idea  starts  with  the  assumption  that  for  the  fitted  response 
there  exists  p  eigenvalues  whose  small  values  imply  a  p-dimensional  ridge. 
Indexing  the  eigenvalues  from  largest  to  smallest  gives  X  =  [X1, ...  ,  XK'P,  XK'P+l, 

,  XKf  and  their  respective  rotated  axes  X  =  [X1,  ...  ,  Xk~p,  Xk-p+1,  ...  ,  X^]T. 
The  distance  DR  from  the  design  centerpoint  to  the  ridge  is  then 

(K-p  .  )  1/2 

DR  =  \  I^XM  (4-62) 

and  the  coordinates  of  the  rotated  system  nearest  the  ridge  are  Xs  =  [X^ , ...  ,  xf'p, 
0,  ...  ,  0]T.  Proceeding  from  Xs  and  moving  exactly  one  unit  in  either  direction  of 
XK  provides  three  sample  responses 

(1)Z(XS),  (2)Z (XS)  +  QK  +  XK,  and(3)Z (XS)-QK  +  XK,  (4.63) 

whose  average  and  sample  variance  are 

Z(XS)  +  2/3-XK  (4.64) 

and 

(e^)2  +  l/3-(XK)2,  (4.65) 

respectively.  Since  the  standard  deviation  of  a  normal  distribution  can  be 
estimated  with  three  samples  using  the  relationship 


sampling  range 
31/2 


(4.66) 
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the  approximate  sample  range  rK  for  Xs  in  the  direction  of  the  rotated  axis  X.K 
within  the  CC  portion  of  the  experimental  design  is 

r*=[3  (0*)2  +  (A*)2]1'2  (4.67) 


(Box  and  Draper  1987). 

Applying  (4.67)  to  APL1P  produces 

r1  =  [3-(214.09)2  +  (229.8)2]1/2  =  436.25  (4.68) 


forX1,  and 

r2  =  [3-(29.19)2  +  (996.5)2]1/2  =  997.78  (4.69) 

for  X2.  Results  (4.68-69)  algebraically  confirm  what  Figure  4.4  shows  as  the 
minimum  sensitivity  to  be  along  the  rotated  X^  axis,  and  to  expect  a  Z(x)  of  no 
more  than  24643  +  436  =  25079  within  plus  or  minus  one  unit  distance  from  Xs. 
Equally  important,  these  results  show  that  movement  along  the  X2  axis  provides 
the  worst  deviation  from  the  optimal  solution,  and  thus  should  be  avoided.  Since 
the  original  design  center  point  essentially  lies  on  the  X1  axis,  the  analysis  can 
effectively  conclude  that  near-optimal  solutions  lie  on  a  ridge  defined  by  the 
relation 

x  =  (1800,1570)  +  p-(.7396,-.6730);  p  e  [-1082.+1082].  (4.70) 

This  relationship  can  be  presented  to  the  decision-maker  —  in  even  simpler  terms 
—  to  provide  the  basic  insight  into  the  problem  solution. 
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APL1P  Response  Analysis  Summary.  A  near-optimal  solution  requires 
(1)  a  total  combined  investment  of  3370 for  x1  and  x2;  (2)  a  one-for-one 
tradeoff  between  x1  and  x2  starting  at  (1800,1570);  and  (3)  neither  xi  < 
1000.  Generally  speaking,  the  total  amount  of  installed  capacity  is  more 
important  than  how  it  is  split  between  the  generators. 

4.5  Distribution  Analysis 
4.5.1  Introduction 

McKay  (1992)  cites  two  primary  questions  asked  about  the  uncertainty  of 
the  output  of  any  simulation  model:  'What  Influences  It?'  and  'How  Large?'.  The 
previous  section  shows  how  RSM  answers  the  first  question  regarding  which 
decision  variables  affect  the  response  and  by  how  much.  Typically,  such  analysis 
also  presents  the  decision-maker  with  a  range  of  multiple  optimal  or  near-multiple 
optimal  solutions  due  to  the  'flatness'  of  the  region  of  optimality,  thus  allowing  the 
use  of  subjective  criteria  and  individual  judgment  not  captured  in  the  original 
model.  However,  for  stochastic  linear  programming  problems,  McKay's  second 
question  remains  unanswered.  Until  now,  both  the  literature  and  this  dissertation 
focus  on  the  expected  value  of  the  recourse  function  (3.1b),  assuming  it  to  be  the 
primary  decision  criteria  (in  conjunction  with  known  first-stage  costs).  Instead, 
this  dissertation  contends  most  decision-makers  want  a  range  of  possible 
outcomes  based  on  their  decision  —  not  just  an  average.  This  thesis  also  contends 
that  such  knowledge  will  narrow  the  choice  of  solution  in  near-optimal  situations 
by  comparing  of  the  range  of  possible  outcomes  for  each  x  and  selecting  the  one 
with  the  smallest  variance,  best  worst-case  scenario,  best  best-case  scenario,  or 
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some  other  criteria.  In  effect,  it  introduces  the  underlying  distribution  as  a 
decision  criteria  for  two-stage  stochastic  linear  programming  problems  with 
recourse. 

Incorporating  the  variability  of  /t(x,m,T)  in  the  modeling  process 
represents  this  dissertation's  other  major  contribution  to  this  area  of  research.  This 
section  inaugurates  this  topic  by  reviewing  its  distributional  features,  then 
outlining  the  measures  of  uncertainty  used  to  characterize  it.  It  closes  the  chapter 
by  applying  its  techniques  on  the  example  problem  APL1P  and  proposing  a  final 
solution  recommendation. 

4.5.2  Distributional  Characteristics 

Deciding  which  measure  of  uncertainty  to  employ  depends  upon  the 
attributes  (or  assumptions)  regarding  the  underlying  distribution.  Redesignating 
the  k  subscript  to  denote  a  distinct  solution  x*,  recall  the  definition  of  z .,•*  as  the 
optimal  value  of  the  recourse  function  (3.1b)  plus  cxk  for  the  i&  realization  of  co 
and  T;  and,  the  random  variable  zk  distributed  as  ex*  +  h(xk,(£>, T),  where  E[z*J  = 
Z(xk).  Since  the  distribution  of  zk  is  a  function  of  x,  its  form  and  parameters  will 
vary  throughout  the  feasible  region  X,  and  must  be  estimated  for  larger  problems. 
In  the  case  of  smaller  problems  where  the  population  distribution  can  be 
determined  for  xk,  then  statistical  estimation  is  not  required.  Most  importantly,  it 
is  very  unlikely  that  zk  will  follow  a  unimodal  or  symmetric  distribution. 

Empirical  evidence  from  APL1P  suggest  this  asymmetry  occurs  even  for 
the  simplest  problems.  Using  the  same  xk  points  from  Table  4.2,  Table  4.5  lists 
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Table  4.5 

Distribution  Parameters  for  VRT  Samples  in  Table  4.2  for  APL1P 


X* 

zjt  =  Z(xfc) 

Median  z k 

S.  D.  of  Zk 

Min  zk 

Max  Zk 

(1800,1800) 

24689.1 

26160.0 

4808.2 

18270.0 

45990.0 

(900,900) 

26425.4 

27705.0 

3553.5 

17550.0 

40995.0 

(900,2700) 

25131.3 

25265.0 

5207.5 

18720.0 

45495.0 

(2700,900) 

25299.3 

25072.0 

5282.0 

19170.0 

46485.0 

(2700,2700) 

27499.3 

27210.0 

4070.5 

23670.0 

50985.0 

the  population  mean,  median,  and  standard  deviation;  and,  the  lowest  and  highest 
possible  values  for  zk.  The  differences  in  the  mean  and  median  of  zk  suggest 
varying  degrees  of  symmetry  of  the  underlying  distribution,  while  the  differences 
in  standard  deviations  indicate  differences  in  dispersion  as  well.  This  lack  of 
symmetry  does  not,  of  course,  mean  that  standard  statistical  analysis  techniques 
employing  the  central  limit  theorem,  such  as  confidence  interval  estimation 
regarding  Z5(x),  do  not  apply.  Furthermore,  the  literature  does  not  suggest  that  zk 
itself  is  normally  distributed,  or  should  even  follow  a  symmetric  distribution.  It 
does  mean,  however,  that  any  assumptions  made  by  the  decision-maker  regarding 
the  distributional  form  of  zk  can  be  misleading. 

Figures  4.4  and  4.5  graphically  show  the  advantages  of  more  accurate 
distributional  analysis,  and  illustrate  the  skewed  nature  of  zk,  by  comparing 
population  histograms  for  the  points  (1800,1800)  and  (2700,2700),  respectively, 
to  normal  distributions  with  the  same  mean,  variance,  and  area  (based  on  a 
presentation  idea  by  Bradley  1968).  Both  samples  share  —  in  varying  degrees  — 
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Figure  4.5.  Comparison  of  Population  Distribution  of  (1800,1800)  in  APL1P 
to  Normal  Distribution  with  Same  Mean,  Variance,  and  Area 

a  heavily  skewed  dispersal  favoring  the  area  just  below  the  average,  an  attenuated 
trimodal  shape,  and  a  significant  presence  of  high-cost  solutions  far  above  what 
occurs  in  the  upper  tail  of  a  normal  distribution.  Clearly,  any  assessment  using 
normal  or  symmetric  assumptions  on  the  part  of  the  decision-maker  would  not  be 
appropriate  in  either  case  —  such  presuppositions  would  overstate  the  frequency 
of  low-cost  solutions  while  understating  the  high  ones.  Additionally,  for  non- 
symmetric  distributions  in  general,  the  variance  of  z*  does  not  adequately 
characterize  its  coverage  (McKay  1992). 
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47000 

Figure  4.6.  Comparison  of  Population  Distribution  of  (2700,2700)  in  APL1P 
to  Normal  Distribution  with  Same  Mean,  Variance,  and  Area 

Although  difficult  to  predict  in  its  exact  form,  intuitively  it  is  not 
surprising  that  the  distribution  of  z k  possesses  such  characteristics.  As  an 
optimization  function,  it  would  shift  bases  to  minimize  changing  resource 
demands  in  to,  -  T,xjt,  thus  mitigating  somewhat  their  effects  on  the  cost. 
Furthermore,  marginal  increases  in  (0,  -  T/x*  over  Gty  -  T )\k  (j  *  i )  may  occur  in 
slack  resources  that  would  have  no  effect  in  increasing  the  solution  value  z from 
zjk;  thus,  the  'bunching'  effect  observed  in  Figures  4.5  and  4.6.  Consequently, 
higher  values  of  z*  should  occur  disproportionately  fewer  times  than  the 
distribution  of  the  elements  in  ©  and  T  would  suggest. 
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Based  on  empirical  evidence  as  seen  in  Figures  4.5  and  4.6,  and  in 
additional  preliminary  research,  this  dissertation  contends  that  non-parametric,  or 
distribution-free,  statistical  analysis  presents  an  excellent  option  for  analyzing  the 
range  of  coverage  of  z*  for  reasons  of  simplicity,  computational  efficiency,  and 
minimal  assumptions  of  the  underlying  distribution  (see  Bradley  1968). 
Consequently,  following  Wilson’s  (1995)  suggestion,  this  research  investigates  the 
distribution  of  T)  by  employing  tolerance  limits  as  the  basic  measure  of 
interest  for  z&.  As  a  non-parametric  statistic,  it  possesses  characteristics  that  make 
its  application  appealing  in  the  context  of  the  stochastic  recourse  problem.  More 
importantly,  it  provides  the  necessary  information  about  the  dispersal  of  z*;  and, 
in  conjunction  with  the  response  surface  approximation  of  Z(x),  gives  the 
decision-maker  a  truly  useful  description  of  the  stochastic  behavior  of  (3.1). 

4.5.3  Tolerance  Limits 

Quoting  Conover 

...  confidence  intervals  ...  provide  interval  estimates  for  unknown 
population  parameters,  such  as  the  unknown  probability  p  or  the  unknown 
quantile  xp,  and  a  certain  probability  1  -  a  (confidence  coefficient)  that  the 
unknown  parameter  is  within  the  interval.  Tolerance  limits  differ  from 
confidence  intervals  in  that  tolerance  limits  provide  an  interval  within 
which  at  least  a  proportion  q  of  the  population  [emphasis  added]  lies,  with 
probability  1  -  a  or  more  that  the  stated  interval  does  indeed  "contain"  the 
proportion  q  of  the  population  (Conover  1980, 1 17). 

Thus,  by  estimating  the  population  range  tolerance  limits  provide  a  measure  of 
’coverage’  for  asymmetrical  distributions  that  convey  a  useful  characterization  of 


149 


the  distribution  of  zk  to  the  decision-maker.  Adapting  Conover's  (1980)  notation 
and  terminology  to  the  recourse  context  gives  the  following  definitions. 

Definition.  Let  z,  represent  cx  +  /z(x,CDi,Ti)  for  the  0  realization  of  a 
random  sample  of  (0  and  T,  i  =  1, ... ,  /  (again,  dropping  the  k  subscript  for 
convenience). 

Definition.  Let  the  parameters  r  and  m,  where  r  <m,  index  the  ordered 
sample  zi  <  ...  <  zr  <  ...  <  zm  <  z/,  and  1  <  r  <  m  <  I.  Furthermore,  let 
zq  =  -oo  and  z/+i  =  +<*>. 

Conover  describes  the  tolerance  interval  approach  as  determining  the  sample  size 
I  such  that  for  a  probability  of  at  least  1  -  a  no  less  than  a  q  portion  of  the 
population  lies  between  zr  and  z /+m.;,  where  q,r,  m,  and  a  are  predetermined. 
Note  that  this  formulation  allows  either  one  sided  tolerance  limits  (r  or  m  equals 
zero)  or  a  two-sided  tolerance  intervals  (r  and  m  not  equal  to  zero),  and  can  be 
approximated  with  the  relationship 


l±R 

\-q. 


r  +  m- 


(4.71) 


where  Xi-a  is  the  (1  -  q)  quantile  of  a  chi-squared  distribution  based  on  2-(r  +  m) 
degrees  of  freedom  (Conover  1980). 

Examining  (4.71)  shows  that  the  required  sample  size  increases  the  most 
for  higher  proportions  of  the  coverage  percentage  q ,  and  less  so  for  increases  in 
the  confidence  level  a  and  indices  r  and  m.  Striking  a  balance  between  these 
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factors  and  the  computational  requirements  of  sampling  /i(x,oo,T),  this  dissertation 
will  set  r  =  1  and  m  =  l,  q  =  .05,  and  a  =  .01,  giving  a  sample  size  /  =  130.  Note 
that  this  sample  size  does  not  depend  upon  the  distribution’s  form  or  size,  and  thus 
can  be  applied  for  any  stochastic  recourse  problem. 

4.5.4  Example  Distribution  Analysis 

Demonstrating  the  tolerance  limits  method  on  APL1P,  Table  4.6  shows 
the  results  for  five  equidistant  x*  taken  along  the  minima  ridge  defined  by  (4.70). 
First,  note  that  the  probability  coverage  meets  or  exceeds  95%  for  all  points 
except  (1000,2298),  which  falls  slightly  below  the  targeted  proportion  q.  Second, 
in  general  the  tolerance  limits  provide  a  very  good  approximation  of  the  range  for 
both  individual  estimates  of  x*  and  comparisons  across  the  ridge.  For  instance, 
the  tolerance  limits'  upward  trend  of  the  estimated  maximum  z*  with  respect  to 
increases  in  p  matches  the  actual  upward  trend  of  the  population  maximum. 
Third,  both  the  tolerance  limits  and  Z(x)  tend  to  favor  the  minima  ridge  from  the 
optimal  point  to  the  stationary  point  Xs  of  the  response  surface  approximation. 
Fourth,  there  does  exists  a  difference  between  the  population  extremes  and  those 
found  by  tolerance  limit  sampling.  However,  the  purpose  of  tolerance  limits  is  to 
estimate  the  range  we  can  expect  z*  to  fall  into  for  a  q  portion  of  the  time  —  not 
to  provide  point  estimators.  Finally,  Figures  4.7  and  4.8  graphically  compare  the 
tolerance  limits  data  in  Table  4.6  to  the  distribution  of  z*  for  the  two  best  sampled 
points  with  respect  to  Z(x)  (i.e.,  (1400,1934)  and  (1800,1570)). 
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Table  4.6 

Tolerance  Limits  for  APL1P  Minima  Ridge  (Random  Seed  =  674303674) 


p 

X* 

Z(x) 

Mdn.  Zfc 

Population 

Min  z k  Max  Z£ 

Tol.  Limit 

Min  zfc  Max  Z£ 

%  Cvg. 

-1082 

{1000,2298} 

24828 

25493 

17915 

44795 

19193 

38727 

.9493 

-541 

{1400,1934} 

24689 

26285 

17805 

45105 

18405 

40364 

.9864 

0 

{1800,1570} 

24642 

26075 

17695 

45415 

18695 

41002 

.9610 

541 

{2200,1206} 

24801 

25415 

18185 

45725 

18535 

41725 

.9861 

1082 

{2600,842} 

25230 

25093 

18675 

46035 

18925 

43035 

.9921 

v(zk) 


0.06 

0.05  - 

0.04- 

0.03- 

0.02 : 

0.01- 

ol — . - 1 - r 

10000 


Figure  4.7.  Comparison  of  Tolerance  Limits  to  Population  Distribution, 
Mean,  and  Median  for  (1400,1934)  in  APL1P 
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10000  18000  26000  34000  42000 


Figure  4.8.  Comparison  of  Tolerance  Limits  to  Population  Distribution, 
Mean,  and  Median  for  (1800,1570)  in  APL1P 


These  sample  points  indicate  that  x*  =  (1400,1934)  offers  a  lower  range  of 
possible  Zfc  values  without  sacrificing  much  of  its  expected  value.  Based  on  these 
results,  the  distributional  analysis  suggests  restricting  the  final  choice  by 
modifying  (4.70)  to  be 

x  =  (1400,1934)  +  p-(.7396,-.6730),  p  e  [0,541],  (4.72) 

and  presenting  to  the  decision-maker  the  following  guidance. 

APL1P  Distributional  Analysis  Summary.  A  near-optimal  solution 
requires  (1)  a  total  combined  investment  of  3350  ±20  for  x;  and,  (2)  a 
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one-for-one  exchange  between  x}  and  x2  starting  at  (1400,1934)  and 
ending  at  (1800,1570).  Furthermore,  as  the  solution  moves  away  from 
(1400,1934)  expect  a  tradeoff  between  slightly  lowering  the  expected  cost 
and  marginally  increasing  the  expense  of  the  worst-case  scenario. 

Finally,  it  should  be  noted  that  using  distribution-free  statistics  like 
tolerance  limits  does  not  diminish  either  the  utility  of  basing  the  response  surface 
on  Z(x).  First  —  and  most  significantly  —  both  the  search  techniques  and  the 
response  surface  analysis  fundamentally  depend  on  the  convexity  of  Z(x).  Thus, 
changing  the  search  and  approximation  criteria  away  from  Z(x)  poses  significant 
challenges  that  are  beyond  the  scope  of  this  research.  Second,  intuitively  it 
appears  likely  that  the  range  of  z *  will  be  a  function  of  x;  for  example,  the  lowest 
maximum  value  of  z *  for  the  entire  set  X  will  probably  occur  somewhere  in  or 
near  the  region  of  optimal  or  near-optimal  solutions.  Therefore,  using  x*  or  x'  as  a 
starting  point  for  the  response  surface  and  distributional  analysis  can  still  be 
justified.  Finally,  this  research  wants  to  supplement  the  use  of  Z(x),  not  replace 
it;  consequently,  distributional  analysis  presents  a  natural  and  obvious  method  for 
providing  the  decision-maker  with  additional  insight  into  stochastic  linear 
programming  problems  with  recourse. 
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Chapter  5 


Problem  Set  Analysis 


5.1  Introduction 

This  chapter  demonstrates  the  response  surface  analysis  approach  outlined 
in  Chapters  3  and  4  on  the  set  of  problems  listed  in  Table  5.1.  The  subsequent 
sections  each  cover  a  single  problem  by  providing  a  brief  formulation  and 
description  followed  by  an  analysis  similar  in  structure  and  content  to  the  one 
presented  for  APL1P  (deviations  from  the  prescribed  algorithms  are  noted  as 
well).  These  problems  have  also  been  independently  solved  (Morton  1994c),  thus 
providing  a  benchmark  for  confirming  the  optimal  solutions.  Table  5.1  also 
shows  which  optimization  and  analysis  techniques  from  Chapters  3  and  4  are  used 
for  each  problem. 

As  in  the  case  of  APL1P,  the  computational  environment  consists  of  an 
IBM  RS/6000  Model  320  running  under  AIX  3.2  and  FORTRAN  90  for  the  OBS 
and  RSA  programs,  while  SAS  Version  6.08  running  on  OpenV AX/VMS 
provides  the  response  surface  analysis.  As  before,  all  timing  results  are  based  on 
AIX's  estimate  of  CPU  code  execution,  and  do  not  include  system  overhead  or  I/O 
requirements.  The  code  also  requires  linking  to  IBM's  OSL  and  IMSL  libraries. 
Finally,  due  to  the  size  of  some  problems  this  chapter  depicts  only  the  relevant 
portions  of  the  formulation. 
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Table  5.1 


Application  Summary  of  Optimization  and  Statistical  Analysis  Techniques 


APL1P 

Problem  Set  Description 

PGP2  CEP1  4TERM 

20TERM 

#  of  x  Variables 

2 

4 

4 

15 

63 

Rows/Columns  in  A 

2/2 

2/4 

9/8 

3/15 

3  /  63 

Rows/Columns  in  W 

5/9 

7/16 

7/15 

28  / 146 

124/764 

#  of  Random  Variables  in  co/T 

5 

3 

3 

8 

40 

#  of  Scenarios 

1280 

576 

216 

256 

1.095- 1012 

Optimization 

APL1P 

PGP2 

CEPl 

4TERM 

20TERM 

Geometric  Simplex 

Yes 

Yes 

Yes 

Yes 

— 

Projected  Gradient 

Yes 

Yes 

Yes 

Yes 

Yes 

PARTAN 

Yes 

Yes 

Yes 

Yes 

— 

OSL 

Yes 

Yes 

Yes 

Yes 

Yes 

OBS-Complete  /  ODV 

Yes 

Yes 

Yes 

— 

— 

OBS-Reset 

— 

— 

— 

Yes 

— 

Statistical  Analysis 

APL1P 

PGP2 

CEPl 

4TERM 

20TERM 

Control  Variates 

Yes 

Yes 

Yes 

Yes 

Yes* 

Latin  Hypercube 

Yes 

Yes 

Yes 

Yes 

Yes 

Population^ 

Yes 

Yes 

Yes 

Yes 

— 

Full  Exper.  Design 

Yes 

Yes 

— 

— 

— 

Fractional  Exper.  Design 

— 

— 

— 

Yes 

Yes 

Prelim.  Screening  Design 

— 

— 

— 

— 

Yes 

Resp.  Surf.  (Minima  Ridge) 

Yes 

Yes 

— 

Yes 

— 

Resp.  Surf.  (Maxima  Ridge) 

Yes 

Yes 

— 

Yes 

Yes 

Tolerance  Limits 

Yes 

Yes 

Yes 

Yes 

Yes 

t  -  'Population'  implies  all  scenarios  evaluated  for  validation  of  VRTs  and  Tol.  Limits. 
Additionally,  true  expected  values  used  in  experimental  design  when  available. 

| .  Used  only  for  comparison  with  LHs;  NOT  used  for  reported  response  surface  estimates. 


5.2  PGP2 

5.2.1  PGP2  Problem  Description 

PGP2  represents  a  power  generation  expansion  problem  developed  by 
Louveaux  and  Smeers  (1988)  and  modified  for  use  by  the  University  of  Michigan 
FTP  site  (Holmes  1995).  Figure  5.1  on  the  following  page  gives  its  formulation, 
whose  recourse  configuration  contains  the  transportation  problem  structure  found 
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in  many  capacity  expansion  problems.  One  difference  from  APL1P  lies  with  the 
first-stage  constraints;  in  PGP2's  case,  Ax  =  b  includes  both  a  minimum  supply 
capacity  and  a  first-stage  capital  improvement  budget  constraint.  PGP2  also 
limits  its  source  of  variability  to  the  right-side  vector  to  of  the  recourse  problem, 
whose  discrete  representation  allows  for  576  possible  demand  scenarios.  Finally, 
the  recourse  error  vectors  zJ  (with  objective  coefficient  values  of  1,000)  ensure 
feasibility  for  all  possible  values  of  x  and  co. 

5.2.2  PGP2  Optimization  Results 

The  OBS  algorithm  found  32  optimal  bases  and  their  respective  dual 
vectors  for  PGP2  (Table  5.2),  with  a  frequency  of  occurrence  listed  in  Table  5.3. 
The  OBS-Complete  algorithm  subsequently  found  one  additional  basis  during  the 
line  search  portion  of  the  PART  AN  search.  The  frequency  distribution  of  the 
optimal  basis  set  for  PGP2  shows  less  concentration  of  optimality  than  seen  with 
APL1P;  e.g.,  92%  of  optimality  occurs  under  bases  #1  through  #8  for  APL1P 
versus  74%  for  PGP2.  This  greater  dispersion,  combined  with  the  larger  number 
of  dual  vectors,  results  in  the  OBS-Complete  and  ODV  algorithms  turning  in 
comparable  computation  times  for  all  three  search  techniques  (Table  5.4).  Both 
options  turn  in  performance  times  an  order  of  magnitude  better  than  OSL  alone. 

Due  to  the  relatively  small  number  of  scenarios  in  PGP2,  all  search 
algorithms  calculate  the  exact  value  of  Z(x)  for  this  problem.  In  turn,  this  lack  of 
experimental  error  most  likely  causes  the  Geometric  Simplex  Algorithm  to 
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Table  5.2 

OBS-Complete  Results  for  PGP2 


Sample  Size 
(to  -  Tx) 

Random  #  Seed 

All  Bases  /  First 
Optimal* 

#  Opt.  Bases/ 
Dual  Vectors 

CPU  Time  (secs) 

1,000 

3203801 

All 

21 

15.34 

5,000 

7733099 

All 

25 

33.24 

50,000 

11351 

First 

31 

161.95 

250,000 

603939541 

First 

32t 

r*  i  •  jv.  .  r_ 

229.48 

_  •  ti  _  _ i _ r  a  11 

*  -  'First  Optimal'  option  skips  any  remaining  bases  after  finding  first  feasible,  whereas  'All 
Bases'  checks  every  basis  in  P  for  each  sample  (co  -  Tx) 
t  -  Subsequent  search  routines  found  1  additional  optimal  basis. 


Table  5.3 

Frequency  of  Basis  Optimality  for  PGP2 
(Based  on  4&  Run  from  Table  5.2) 


Basis  ED  # 

i 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13-32 

Freq.  of  Optimality 

.19 

.19 

.09 

.09 

.06 

.05 

.04 

.03 

.03 

.03 

.02 

.02 

.14 

Cumulative  Freq.* 

.19 

.38 

.47 

.56 

.62 

.67 

.71 

.74 

.78 

.81 

.83 

.86 

1.0 

*  -  May  not  add  due  to  roundoff  error. 


Table  5.4 

Computation  Times  of  OSL/OBS/ODV  Options  for  Geometric  Simplex, 
Projected  Gradient,  and  PART  AN  Algorithms  for  PGP2  (in  seconds) 


Algorithm 

OSL 

OBS* 

ODV 

GEOMETRIC  SIMPLEX  (120  Iterations) 

861.67 

96.3 

87.62 

PROJECTED  GRADIENT  (22  Iterations) 

1028.41 

66.24 

65.99 

PARTAN  (2  Iterations) 

212.22 

14.75 

13.54 

*  -  PARTAN  search  found  one  additional  optimal  basis. 


contract  on  its  best  vertex.  As  shown  in  Table  5.5,  the  algorithm  initially 
progresses  beyond  its  starting  set  of  vertex  points,  but  by  iteration  40  settles  into 
slowly  contracting  about  its  best  vertex.  At  k  =  86,  it  further  shrinks  by 
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contracting,  then  flipping  about  the  best  point  and  enlarging  when  further 
improving  moves  are  not  possible.  Another  contraction  follows  the  enlargement 
move  when  it  does  not  provide  any  better  vertices;  and,  when  the  final  contraction 
offers  no  help  either,  the  simplex  re-initializes  itself  and  proceeds  with  iteration 
86.  Having  kept  the  best  vertex  from  the  previous  simplex  set,  the  procedure 
settles  back  into  its  slow  contraction  pattern  by  k  =  100  with  little  improvement  in 
the  objective  function  value.  This  premature  contraction  most  likely  occurs  since 


Table  5.5 

Selected  Geometric  Simplex  Moves  for  PGP2 
_ (Random  Seed  -  296279) _ 


k 

X1,  X2,  X3,  X4 

Z(x) 

Simplex  Move 

Replaces 

0 

4.00, 0.00,  5.00, 6.00 

504.40 

Initial  Vertex  (xev) 

— 

1 

6.99,  .77,  5.18,6.45 

466.40 

Expansion  xe 

5 

3 

5.47, 0.84,  5.94, 4.38 

455.02 

Contraction  xc 

5 

10 

5.10, 1.29, 6.12, 4.96 

453.84 

Contraction  xc 

5 

20 

2.76, 1.99, 5.69,7.35 

450.39 

Contraction  xc 

5 

40 

3.02, 1.98,  5.88, 6.77 

449.94 

Contraction  xc 

5 

85 

3.12, 1.93, 5.88,6.69 

449.88 

Contraction  xc 

5 

85-86 

— 

— 

Shrinkage 

— 

85-86 

— 

— 

Enlargement 

— 

85-86 

— 

— 

Shrinkage 

— 

85-86 

— 

— 

New  Simplex 

— 

86 

7.79, 1.95,  3.13,  8.24 

475.74 

Expansion  xe 

5 

90 

4.64,  2.44,  5.44, 4.07 

452.16 

Expansion  xe 

5 

100 

3.31, 1.70,  5.86, 6.48 

449.68 

Contraction  xc 

5 

110 

1.54,  2.19, 6.64, 7.10 

449.42 

Expansion  xe 

5 

120 

2.70, 1.81,6.12, 6.82 

449.32 

Contraction  xc 

5 
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every  interior  point  of  the  simplex  offers  a  slight  improvement  over  the  next- 
worst  vertex  value;  however,  the  algorithm  does  offer  considerable  improvement 
over  the  expected  value  approximation  Z(xev). 

The  Projected  Gradient  Algorithm  (Table  5.6)  performs  very  well  in 
the  case  of  PGP2.  Since  it  does  not  encounter  either  a  binding  constraint  set  or  a 
multiple  optimal  solution  region,  the  algorithm  suffers  from  not  finding  a  zero 
gradient  due  to  the  non-differential  property.  Consequently,  terminating  the 
search  requires  an  arbitrary  stopping  point  y  that  in  this  case  was  determined  by 
feedback  from  the  first  set  of  iterations.  After  initially  finding  a  good  solution  at 
step  9  (based  on  d 4  <  -1.0,  j  =  1,...,  4),  the  search  'overshoots'  the  optimal 
solution  in  step  10  as  evidenced  by  d3*  =  -1.71,  even  though  Z(xio)  <  Z(X9). 
Therefore,  by  setting  y  =  -.99,  j  =  1,...,  4,  the  search  continues  the  descent  until 
reaching  step  22  where  d*  again  is  less  than  -.99  for  each  of  its  components. 
Finally,  the  fact  that  solutions  better  than  step  ll's  do  not  occur  in  steps  13 
through  20  tend  to  confirm  X22  as  a  near-optimal  solution. 

Based  on  the  results  of  the  Projected  Gradient  Algorithm,  the 
PARTAN  search  (Table  5.7)  stops  after  reaching  a  similar  solution  with  respect  to 
Z(x)  and  d&,  using  this  prior  knowledge  allows  the  search  to  conclude  after  two 
iterations.  One  notable  difference  between  this  result  and  the  proposed  algorithm 
in  Chapter  3  concerns  the  differences  between  the  estimated  optimal  scalar 
multiple  and  the  one  actually  used.  In  PGP2,  the  quadratic  fit  of  the  line  spanning 
the  entire  feasible  region  (based  on  x^-i  and  p&)  rarely  exceeds  a  R2  of  .80.  After 
several  preliminary  trials,  a  subjective  interpolation  of  the  data  points  proved  to 
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Table  5.6 

Selected  Projected  Gradient  Iterations  for  PGP2 


(Scenarios  = 

576,  Q=  8) 

k 

X*,  X^,  X^,  X4 

d  k 

Est.  Q 

Act.  q 

R2 

Z(x) 

i 

5.00,  5.00,  5.00,  5.00 

8.70, 6.95,  8.97. 

,5.99 

0.42 

.42 

.900 

466.62 

2 

3.92,  4.37, 4.88,  4.74 

6.65, 5.77,  5.65, 

,5.13 

0.24 

.24 

.995 

451.61 

3 

3.56,  4.08, 5.03, 4.54 

0.58,  -0.30, 0.85 

,0.86 

0.03 

.03 

.998 

449.13 

4 

3.44,  4.16, 4.91,  4.76 

0.58,  -0.30,  -0.43, 

-0.86 

0.28 

.28 

.971 

447.79 

8 

2.35, 4.32, 5.07,  5.53 

-0.71,  -0.95,  -0.43, 

-0.85 

-2.11 

.01 

.999 

447.78 

9 

2.37, 4.40,  5.02,  5.59 

-0.71,  -0.95,  -0.43, 

-0.85 

-2.48 

.01 

.999 

447.67 

10 

2.38,  4.47, 4.97,  5.65 

-0.71,-0.95,-1.71, 

-0.84 

-0.98 

.005 

.999 

447.60 

11 

2.37, 4.48, 4.99,  5.65 

-0.70,  -0.94,-1.71, 

-0.84 

-1.04 

.005 

.999 

447.56 

21 

2.26, 4.57, 5.00,  5.66 

-0.71,-0.95,-1.71, 

-0.84 

-0.93 

.002 

.999 

447.56 

22 

2.25. 4.57, 5.00,  5.65 

-0.71,  -0.95,  -0.43, 

-0.84 

-2.95 

— 

.999 

447.55 

Table  5.7 

PART  AN  Iterations  for  PGP2  (Scenarios 

=  576,  Q  = 

=  8) 

k 

X*,  X^,  X^,  X4 

Est.  Q* 

Act.  q 

R2 

Z(x) _ 

xo 

5.00,  5.00,  5.00,  5.00 

.53 

.71 

.743 

466.62 

xi 

3.57,  4.17,  4.84, 4.65 

.71 

.71 

.985 

449.06 

PI 

2.60,  4.29,  5.10, 5.47 

.52 

.70 

.767 

447.90 

x2 

2.43, 4.24,  5.11,5.50 

.48 

— 

.791 

447.84 

give  better  results  than  following  the  regression's  estimate.  Consequently,  the 
results  reflect  this  deviation. 


5.2.3  PGP2  Response  Surface  Analysis 


A  preliminary  full-factorial  CCD  experimental  design  using  Xo  =(2.25, 
4.55,  5.00,  5.50)  as  the  centerpoint  and  each  factor's  half-range  consisting  of  +0.5 


gives  a  semi-definite  fit  (i.e.,  a  saddle  point)  based  on  the  eigenvalues  and 
eigenvectors  reported  in  Table  5.8  (albeit  the  three  negative  eigenvalues  are  only 
marginally  curved  downward).  Since  this  obviously  represents  an  incorrect  fit 
based  on  the  known  convexity  of  the  response,  the  final  design  departs  from  the 
standard  CCD  guidance  to  induce  a  positive-definite  outcome.  Based  on  the  large 
contributions  of  x2,  x3,  and  x4  to  those  rotated  axes  with  downward  curvature,  the 
subsequent  design  drops  the  axial  representation  of  x1,  and  extends  those  of  x2,  x3, 
and  x4  enough  to  leverage  the  regression  into  the  correct  fit.  (The  alternative 
approach  of  increasing  the  size  of  the  CCD  would  impose  asymmetrical  axial 
points  to  remain  feasible.)  Additionally,  the  minima  ridge  estimates  from  the 
preliminary  design  provided  a  slightly  better  centerpoint  location  for  the  final 
design  for  PGP2  (x0  =  (2.271,  4.605,  5.045,  5.567)).  Although  these 
modifications  would  normally  destroy  the  uniform-precision  or  rotatibility  of  the 
design,  in  this  case  error  resulting  from  response  variability  does  not  occur  due  to 
using  the  exact  values  of  Z(x).  Consequently,  the  bias  inherent  in  a  polynomial 
approximation  constitutes  the  sole  source  of  any  lack  of  fit,  and  thus  helps 
mitigate  the  effects  of  altering  the  design. 


Table  5.8 

A  Canonical  Analysis  of  PGP2  (Preliminary  CCD) 


Eigenvalues 

X1 

Eigenvectors 

X2  X3 

X4 

25.4760 

0.5064 

0.4947 

0.5190 

0.4791 

-0.1255 

-0.0222 

0.3542 

-0.7849 

0.5080 

-0.5921 

-0.7749 

-0.0720 

0.3337 

0.5319 

-0.7009 

0.3776 

-0.7904 

-0.0573 

0.4790 
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Table  5.9  gives  the  final  experimental  design,  Table  5.10  provides  the 
regression  parameter  estimates,  and  Table  5.11  supplies  the  eigenvalue, 
eigenvector,  and  ridge  results  for  PGP2.  Although  not  readily  apparent,  the  very 
small  differences  between  the  Z(x)s  of  the  fractional  portion  of  the  design  with  the 
centerpoint's  Z(x)  give  some  indication  of  how  a  semi-definite  fit  can  occur 
without  substantially  higher  axial  response  values.  Tables  5.10  and  5.11  confirm 
a  good  fit  with  a  high  R2  and  positive  eigenvalues,  respectively.  Most 
importantly,  Table  5.11  provides  the  minima  and  maxima  ridge  analysis 
associated  with  the  reported  eigenvalues  and  eigenvectors. 

The  A  canonical  analysis  in  Table  5.11  shows  decision  variables  x2  and  x3 
as  roughly  equal  components  in  the  rotated  axis  containing  the  highest  amount  of 
curvature;  by  contrast,  x1  and  x4  dominate  those  rotated  axes  with  the  least 
increase  in  the  response  Z(x)  per  unit  change  from  the  centerpoint.  Consequently, 
the  minima  ridge  in  the  original  coordinate  system  occurs  along  a  vector  where 
increases  in  x2  and  x3  are  kept  to  a  minimum  by  lowering  x1  and  x4.  Reversing 
the  criteria  for  the  maxima  ridge  produces  a  vector  with  a  rapid  rise  in  the  two 
major  constituent  variables  of  the  first  eigenvector  (x2  and  x3),  while  significantly 
increasing  the  distant  third  contributor  x4  as  well. 

In  conjunction  with  the  minimum  ridge  results,  Table  5.12  presents  a 
tolerance  limits  analysis  based  on  the  even  coded  radius  points.  The  tolerance 
limits  succeed  in  covering  the  z*  response  values  that  will  occur  on  average  at 
least  95%  of  the  time;  however,  the  skewed  distribution  in  PGP2  can  produce 
values  for  single  instances  of  zk  considerably  higher  than  indicated  by  either  Z(x) 
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Table  5.9 

Experimental  Design  for  PGP2 


x> 

Coded  xk 

X2  X3 

X4 

Response 

Z(x) 

-1 

-i 

-i 

-1 

448.532 

-1 

-1 

-i 

+  1 

448.015 

-1 

-i 

+i 

-1 

447.967 

-1 

-1 

+1 

+  1 

447.627 

-1 

+1 

-1 

-1 

447.959 

-1 

+i 

-i 

+1 

447.620 

-1 

+i 

+i 

-1 

447.665 

-1 

+i 

+i 

+1 

449.648 

+1 

-i 

-i 

-1 

448.057 

+1 

-i 

-i 

+1 

447.717 

+1 

-1 

+1 

-1 

447.892 

+1 

-1 

+i 

+1 

449.875 

+1 

+1 

-1 

-1 

447.755 

+1 

+1 

-1 

+1 

449.738 

+1 

+i 

+i 

-1 

450.095 

+1 

+1 

+i 

+1 

452.159 

0 

0 

0 

+3 

450.566 

0 

0 

0 

-3 

448.130 

0 

0 

+6 

0 

455.702 

0 

0 

-6 

0 

455.121 

0 

+6 

0 

0 

454.375 

0 

-6 

0 

0 

454.043 

0 

0 

0 

0 

447.552 

Uncoded  Values 

Coded  Value 

X1 

X2 

X3 

X4 

2.271 

5.567  . 

0 

2.071 

4.845 

5.367 

-1 

2.471 

5.245 

5.767 

+1 

or  the  tolerance  limit's  test  for  the  maximum  value  ( zm ).  The  scale  of  the 
probabilities  p(z*)  and  their  respective  values  z*  prevent  graphing  the  probability 
distribution  (e.g.,  Figure  4.6). 
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Table  5.10 

Regression  Results  for  CC  Design  in  Table  5.9  for  PGP2 


Analysis  of  Variance 


Source 

DF 

Sum  of  Squares 

Mean  Square 

R  Square 

Model 

14 

151.136 

10.795 

.9636 

Error 

8 

5.715 

.714 

Total 

22 

Parameter  Estimates 

Variable 

Coded  Par.  Est. 

Uncod.  Par.  Est. 

Intercept 

447.552 

1782.33 

X* 

0.5159 

-182.07 

x2 

0.6102 

-147.11 

x3 

0.7515 

-157.55 

X4 

1.2163 

-144.86 

x^  •  x^ 

0.4900 

12.25 

X1  •  X2 

2.0463 

8.55 

X2  •  X2 

6.6563 

4.62 

X1  •  x3 

2.2402 

9.33 

X2  •  X3 

12.2771 

8.53 

x3  •  X3 

7.8589 

5.46 

xl.x4 

0.9195 

7.66 

X2  •  X4 

5.5193 

7.67 

x3-x4 

5.5160 

7.66 

x4-x4 

1.7955 

4.99 

Based  on  these  results,  it  appears  that  PGP2  represents  a  problem  with  a 
relatively  flat  surface  in  the  region  of  optimality.  Table  5.11  reports  the  coded 
radius’  response  found  Z(x)  =  447.805  for  the  1.0  coded  radius  along  the  minima 
ridge,  while  Z(x)  =  459.002  at  the  equivalent  distance  in  the  maxima  direction. 
Furthermore,  the  tolerance  analysis  shows  the  worst  case  realization  increasing 
along  the  minima  ridge  as  well;  consequently,  unlike  APL1P  no  tradeoff  exists 
between  increased  Z(x)  and  a  more  favorable  underlying  distribution. 
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Table  5. 11 

A  Canonical  Analysis  of  PGP2 


Eigenvalues 

xl 

Eigenvectors 
x2  x3 

X4 

14.7865 

0.1102 

0.6418 

0.7017 

0.2892 

1.1515 

-0.0018 

0.6365 

-0.7080 

0.3059 

0.5485 

-0.0050 

-0.4216 

0.0127 

0.9067 

0.3143 

0.9939 

-0.0721 

-0.0790 

-0.0270 

Estimated  Minima  Ridge 

Coded  Radius 

xl 

x2  x3 

X4 

Z(x) 

0.0 

2.271 

4.605  5.045 

5.567 

447.55 

0.2 

2.254 

4.641  5.048 

5.461 

447.62 

0.4 

2.233 

4.716  5.074 

5.364 

447.65 

0.6 

2.209 

4.794  5.096 

5.275 

447.71 

0.8 

2.181 

4.871  5.117 

5.194 

447.76 

1.0 

2.150 

4.945  5.137 

5.124 

447.80 

Estimated  Maxima  Ridge 

Coded  Radius 

xl 

X2  X3 

X4 

Z(x) 

0.0 

2.271 

4.605  5.045 

5.567 

447.55 

0.2 

2.277 

4.750  5.206 

5.616 

449.54 

0.4 

2.282 

4.903  5.374 

5.652 

451.77 

0.6 

2.287 

5.057  5.542 

5.687 

454.04 

0.8 

2.291 

5.211  5.711 

5.722 

456.47 

1.0 

2.295 

5.365  5.879 

5.757 

459.00 

Table  5.12 


Tolerance  Limits  for  PGP2  Minima  Ridge  (Random  Seed  =  34808) 


X^ ,  X2,  X3,  X4 

Z(x) 

Mdn.  Z£ 

Population 

Min  Z£  Max  z& 

Tol.  Limit 

Min  z k  Max  z £ 

%  Cvg. 

2.27, 4.61,  5.05, 5.67 

447.55 

466.84 

185.07 

8719.34 

315.83 

i 

584.73 

.9838 

2.25,  4.64,  5.05,  5.46 

447.62 

466.36 

184.56 

8802.09 

320.12 

2250.94 

.9900 

2.23, 4.72,  5.07, 5.36 

447.65 

466.24 

184.71 

8818.87 

334.65 

575.31 

.9706 

2.21,  4.79,  5.10,  5.28 

447.71 

466.16 

184.85 

8830.72 

321.92 

558.81 

.9683 

2.18, 4.87,  5.12,5.19 

447.76 

466.08 

184.94 

8842.56 

335.18 

702.33 

.9804 

2.12,  4.95,  5.14,  5.12 

447.80 

465.89 

184.75 

8879.04 

339.01 

761.46 

.9547 

t  —  Entries  based  on  coded  radius  estimates  in  Table  5.1 1 
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PGP2  Analysis  Summary.  Recommend  a  near-optimal  solution  of  x1  = 
2.271,  x 2  =  4.605,  x3  =  5.045,  and  x4  =  5.567.  If  any  adjustments  in  this 
solution  must  be  made,  then  (1)  avoid  substantial  increases  in  both  x2  and 
x3;  (2)  try  to  significantly  reduce  x4  and  marginally  reduce  x1  by 
proportionally  increasing  x2  and  x3,  respectively;  and,  (3)  recognize  that 
any  change  will  likely  increase  the  cost  of  the  worst-case  scenario. 

Finally,  Table  5.13  presents  the  VRT  results  for  PGP2  for  selected 
factorial  design  points  from  the  preliminary  experimental  design.  With  the 
exception  of  the  second  sample,  which  interestingly  possesses  a  relatively  high 
Z(x),  the  LH  samples  perform  remarkably  and  consistently  well.  By  contrast,  the 
CV  results  vary  considerably  —  from  nearly  100%  reduction  to  one  case  of 
variance  increase.  These  results  suggest  that  the  LH  method  as  the  better  VRT 
for  the  PGP2  problem.  (The  CV  VRT  uses  (d> \j  =  1,  2,  3  for  controls.) 


Table  5.13 

Comparison  of  Estimator  Accuracy  and  Variance  for 
RS,  CV,  and  LH  Sampling  Techniques  for  PGP2  (7=50,  V=10) 


X^,  X^,  X^,  X4 

Z(x) 

Zrs 

s2RS 

Zcv 

s2cv 

ZLH 

s2LH 

2.25,  4.55, 
5.00,  5.50 

447.7 

452.67 

101.1 

450.19 

113.8 

-.13 

445.45 

6.1 

.94 

1.75, 4.05, 
4.50,  5.00 

489.5 

525.75 

1806.6 

505.76 

1673.9 

.07 

507.19 

1301.1 

.28 

2.75,  5.05, 
5.50, 6.00 

458.8 

458.83 

95.2 

458.68 

0.1 

1.00 

459.65 

2.6 

.97 

1.75, 4.05, 
5.50,  6.00 

447.9 

454.05 

222.6 

448.65 

71.1 

.68 

447.46 

10.7 

.95 

t  -  %  Variance  reduction  from  RS;  also  note  that indicates  variance  increase. 


168 


5.3  CEP1 


5.3.1  CEP1  Problem  Description 

CEP1  represents  a  two-stage  machine  capacity  expansion  problem  donated 
to  the  University  of  Michigan  FTP  site  by  Higle  and  Sen  (1990).  The  CEP1 
recourse  formulation  possesses  the  same  transportation  problem  structure  seen  in 
APL1P  and  PGP2,  where  again  the  first-stage  variables  model  a  capacity 
expansion  decision  for  the  supply  nodes.  However,  CEP1  distinguishes  itself 
from  the  previous  problems  in  two  ways:  (1)  the  first-stage  decision  costs  possess 
a  piecewise  linear  structure,  and  (2)  the  first-stage  variables  have  upper  bounds. 
As  shown  in  Figure  5.2,  only  variables  x5  through  x8  affect  the  recourse  problem 
directly;  however,  each  one's  ability  to  do  so  beyond  500  depends  upon  the 
capacity  decision  associated  with  the  variable  pairings  {xJ,xJ+4},j  =  1,...,  4.  In 
other  words,  for  x5  through  x8  the  first  500  units  are  free;  each  additional  unit 
above  that  point  costs  an  amount  associated  with  its  paired  variable. 
Consequently,  the  feasible  region  of  CEP1  can  be  described  in  four-dimensional 
space  (x5,...,x8)  using  a  piecewise  linear  cost  function;  e.g.,  the  cost  of  xJ+4  = 
c/-Min[0,x/+4  -  500],  j  =  1,...,  4.  CEP1  also  models  a  constraint  using  x5,...,  x8 
where  the  upper  bound  is  less  than  or  equal  to  100.  The  nature  of  this  constraint 
is  unknown  to  the  author;  hence,  the  text  will  refer  to  it  as  the  'joint'  constraint. 

The  bounds  on  x5  through  x8  also  present  a  unique  modification  of  the 
response  surface  analysis  by  further  constraining  the  feasible  region  Ax  =  b  with  0 
<  x  <  Ux  (where  Ux  represents  their  upper  limits).  In  APL1P  and  PGP2,  first- 
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MIN  +2.5X1  +3.75x2  +5.0x3  +3.0x4 


+x^ 

<500 

+x6 

<500 

+x7 

<500 

+x8 

£500 

+.08x5 

+.04x6  +.03x7 

00 

X 

© 

+ 

<100 

+x5 

<2000 

+x6 

<2000 

+x7 

<3000 

+x8 

<3000 

-.8x5 

+W4*y 

<0 

-X6 

+W5*y 

<0 

-X7 

+W6*y 

<0 

-X8 

+W7*y 

<0 

Figure  5.2.  CEP1  Formulation  of  First-Stage  Variables 
(W‘*  Represents  i&  Row  of  W.  Constraints  Without  W!*  Comprise  Ax  <  b) 

stage  decision  variables  without  upper  bounds  allows  for  a  response  such  that 
beginning  with  any  non-optimal  feasible  x  and  proceeding  in  any  descending 
search  direction,  Z(x)  will  initially  decrease  due  to  the  combined  effects  of  (1) 
decreasing  recourse  costs  disproportionally  offsetting  the  increasing  expense  of 
those  xh  rising  in  value;  and,  (2)  reduced  recourse  and  first-stage  costs  for 
decreasing  xJs.  This  marginal  cost  reduction  continues  until  reaching  equilibrium 
at  Z(x*),  after  which  the  previous  effects  reverse  themselves  and  drive  Z(x)  back 
up  —  any  additional  supply  of  increasing  xJ  become  surplus  resources  in  the 
recourse  problem,  while  any  decreasing  xJ  cannot  offset  the  additional  cost  of 
resource  shortages.  As  the  analysis  will  shortly  show,  the  upper  bounding  in 
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CEP1  does  not  allow  x  to  reach  this  equilibrium  state  between  changes  in  x  and 
the  recourse  costs,  resulting  in  a  truncated  response  surface  in  the  {x5,...,  x8} 
space. 

Finally,  like  PGP2,  CEP1  restricts  its  variation  to  the  recourse  right-side 
vector  co.  The  demand  scenarios'  sum  total  of  216  represents  the  lowest  number 
of  possible  realizations  of  all  the  problems  investigated  by  this  dissertation.  CEP1 
also  models  surplus  power  availability  to  guarantee  complete  recourse  for  any 
value  of  x  and  realization  of  to. 

5.3.2  CEP1  Optimization  Results 

The  OBS  algorithm  found  42  optimal  bases  and  associated  dual  vectors  for 
CEP1  (Table  5.14),  with  a  frequency  of  occurrence  listed  in  Table  5.15.  In  this 
particular  problem  the  search  techniques  found  six  additional  bases  while  using 
the  OBS-Complete  method,  which  occurred  primarily  when  x8  =  3000  and  one  or 
more  realizations  of  a y  =  0.  This  result  implies  that  the  undirected  Monte  Carlo 
search  of  the  feasible  space  in  the  OBS  algorithm  did  not  sample  this  region 
adequately  enough,  and  suggests  that  additional  optimal  bases  may  remain 
undetected.  (This  analysis  of  CEP1  uses  the  ODV  method  after  the  OBS- 
Complete  algorithm;  thus,  that  algorithm's  use  of  the  expanded  dual  vector  set 
ensures  unbiased  results.) 

In  a  related  matter,  the  OBS-Complete  technique  turns  in  a  better 
performance  than  the  ODV  method  for  all  three  optimal  search  techniques  (Table 
5.16),  although  again  both  outperform  the  OSL  option  by  an  order  of  magnitude. 
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Table  5.14 

OBS-Complete  Results  for  CEP1 


Sample  Size 
(CO  -  Tx) 

Random  #  Seed 

All  Bases  /  First 
Optimal* 

#  Opt.  Bases/ 
Dual  Vectors 

CPU  Time  (secs) 

1000 

18975 

All 

28 

27.06 

5000 

913342061 

All 

33 

55.48 

50,000 

159568 

First 

42 

242.75 

250,000 

506886247 

First 

42t 

120.88 

*  -  'First  Optimal'  option  skips  any  remaining  bases  after  finding  first  feasible,  whereas  'All 
Bases'  checks  every  basis  in  P  for  each  sample  (co  -  Tx). 

f  -  This  run  verified  the  third  run  using  a  larger  sampling  space.  Subsequent  search  routines 
found  6  additional  optimal  bases. 


Table  5.15 

Frequency  of  Basis  Optimality  for  CEP1 
(Based  on  4&  Run  from  Table  5.14) 


Basis  ID  # 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13-42 

Freq.  of  Optimality 

.29 

.13 

.09 

.09 

.07 

.04 

.04 

.03 

.03 

.03 

.02 

.02 

.12 

Cumulative  Freq.* 

.29 

.42 

.51 

.60 

.67 

.71 

.75 

.78 

.81 

.84 

.86 

.88 

1.00 

*  -  May  not  add  due  to  roundoff  error. 


Table  5.16 

Computation  Times  of  OSL/OBS/ODV  Options  for  Geometric  Simplex, 
Projected  Gradient,  and  PARTAN  Algorithms  for  CEP1  (in  seconds) 


Algorithm 

OSL 

OBS* 

ODV 

GEOMETRIC  SIMPLEX  (100  Iterations) 

244.53 

26.74 

32.12 

PROJECTED  GRADIENT  (6  Iterations) 

96.87 

6.89 

8.03 

PARTAN  (7  Iterations) 

261.62 

18.90 

21.43 

*  -  Each  search  technique  found  two  additional  optimal  bases. 


This  performance  advantage  probably  results  from  the  ODV  algorithm  having  to 
check  every  array  of  a  larger  set  of  dual  vectors  (46),  while  the  first  5  optimal 
bases  provide  a  feasible  answer  two-thirds  of  the  time  (on  average)  for  the  OBS- 
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Complete  method.  Consequently,  given  the  marginal  loss  of  coverage  by  the 
initial  optimal  basis  set;  the  ability  of  OBS-Complete  to  recognize  infeasibility 
and  supplement  the  basis  set;  and,  its  slight  performance  edge  over  the  ODV 
method,  the  results  of  CEP1  support  using  the  OBS-Complete  technique. 

Regarding  the  search  techniques,  Tables  5.16  and  5.18  also  show  the 
Projected  Gradient  Algorithm  clearly  outperforming  both  the  Geometric 
Simplex  and  PART  AN  methods,  again  due  to  the  small  number  of  scenarios  all 
search  techniques  calculate  the  true  response  Z(x).  The  Geometric  Simplex 
Algorithm  (Table  5.17)  especially  runs  into  difficulties  with  CEP1  due  to  the 
eventual  collapse  of  the  simplex  into  the  expected  value  approximation  xev. 
Indeed,  during  the  first  100  iterations  the  simplex  finds  an  expansion  move  to 
replace  the  fifth  vertex  every  time,  and  never  initiates  a  shrinkage-enlargement- 
shrinkage  cycle.  This  phenomenon  occurs  due  to  a  combination  of  several 
factors:  (1)  the  small  coverage  area  of  the  collapsed  simplex;  (2)  its  location  near 
the  lower  or  upper  bounds  of  two  of  the  five  variables  (counting  the  slack);  (3) 
sampling  the  entire  population  of  responses  z y  due  to  the  small  number  of 
scenarios;  and,  (4)  the  relatively  steep  slope  of  the  response  in  this  region  of  x. 
Consequently,  the  simplex  (lacking  any  real  directional  data)  tends  to 
incrementally  move  closer  to  the  feasible  boundaries  using  relatively  smaller 
projection  vectors;  and,  never  cycles  through  the  vertices  since  the  steepness  of 
the  response  and  use  of  the  true  response  Z(x)  (versus  an  estimate  Z5(x))  almost 
guarantees  a  marginal  improvement  over  the  value  of  the  next-worst  vertex. 
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Table  5.17 

Selected  Geometric  Simplex  Moves  for  CEP1 
_ (Random  Seed  =  354644707) _ 


k 

x*\  x^,  x\  x^ 

Z(x)* 

Simplex  Move 

Replaces 

0 

0,  500, 1666, 3000 

367014 

Initial  Vertex  (xev) 

— 

1 

14, 1000,  823,1172 

867790 

Expansion  xe 

5 

2 

28,1142,978, 1095 

806841 

Expansion  xe 

5 

3 

41, 718, 1344, 1530 

704261 

Expansion  xe 

5 

4 

10,  829, 1213, 1931 

612543 

Expansion  xe 

5 

5 

21, 767, 1371, 1995 

573611 

Expansion  xe 

5 

10 

2, 565, 1587, 2797 

404966 

Expansion  xe 

5 

20 

0, 500, 1665, 2995 

368060 

Expansion  xe 

5 

40 

0,  500, 1666,  3000 

367014 

Expansion  xe 

5 

60 

0,  500, 1666, 3000 

366972 

Expansion  xe 

5 

100 

0, 500, 1666, 3000 

366951 

Expansion  xe 

5 

*  -  Differences  in  Z(x)  due  to  fractional  components  of  x. 


By  contrast  the  Projected  Gradient  Algorithm  (Table  5.18)  performs 
very  well,  although  it  requires  inputs  of  1.0  for  the  scalar  multiple  of  the 
projection  vector  in  every  case.  The  behavior  of  this  algorithm  provides  the 
clearest  evidence  of  the  truncated  nature  of  the  response  surface  for  CEP1  in  the 
following  ways. 

1 .  Quadratic  Estimates.  The  quadratic  regression  of  the  projection  vector  fits 
extremely  well  at  every  iteration;  furthermore,  the  scalar  multiple  exceeds 
1.0  in  each  case  as  well.  This  implies  that  the  equilibrium  point  remains 
well  below  the  current  optimal  solution. 


174 


Table  5.18 

Projected  Gradient  Iterations  for  CEP1 
_ (Scenarios  =  216, 6=8) _ 


k 

X^.,  x*\  x^,  x^ 

d* 

jgk£ 

Act.  q 

R2 

_ Z (?) _ 

i 

500,  500,  500,  500 

-.23,  .33,  .36,  .49 

5.2 

1.0 

.999 

1,234,278 

2 

31, 1155, 1218,  1478 

-.31,  .26,  .33,  .49 

19.9 

1.0 

.999 

640,930 

3 

0,  1181, 1250,  1526 

0,  -.13,  .04,  .39 

3.5 

1.0 

.999 

618,832 

4 

0,  693, 1409,  3000 

0,  -.05,  .06,  0 

1.4 

1.0 

.999 

374,482 

5 

0, 0, 2333,  3000 

0,0, -.12,  .35 

1.3 

1.0 

1.00 

355,160 

6 

0, 0,  2333,  3000* 

0, 0,  0, 0 

— 

— 

— 

355,160 

*  -  Optimal  solution. 


2.  Descent  Gradient.  Table  5.18  reports  the  normalized  projected  descent 
gradient  d6  =  0,  thus  implying  an  optimal  solution  since  d*  is  derived  from 
the  true  unconstrained  descent  gradient  -VZ(x6).  Given  the  non- 
differentiable  property  of  E[/i(x,cd,T)],  such  a  condition  can  occur  either 
through  multiple  optimality  or  binding  constraints  as  expressed  in  the 
working  set.  Furthermore,  the  unprojected  descent  gradient  -VZ(x*) 
remains  relatively  large  (-83,  -189,  -155,  and  -170  for  x5  through  x8, 
respectively)  at  the  optimal  solution.  This  indicates  that  an  unconstrained 
environment  would  allow  further  reductions  in  Z(x)  from  the  current 
position. 

Unlike  the  Geometric  Simplex  Algorithm,  the  Projected  Gradient 
Algorithm's  strong  directional  capabilities,  coupled  with  scalar  estimates 
enhanced  by  the  lack  of  a  flat  region,  finds  the  optimal  solution  quickly. 
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The  PARTAN  Algorithm  also  performs  well,  although  it  suffers 
somewhat  from  the  presence  of  upper  bounds  on  the  first-stage  decision  variables 
(Table  5.19).  Recalling  that  the  PARTAN  method  derives  its  directional  guidance 
from  an  alternating  gradient  and  line  search  approach,  PARTAN  momentarily 
stalls  when  x^.i  and  p*.  approach  the  same  point  (0,  693,  1409,  3000).  However, 
unlike  the  Geometric  Simplex  Algorithm,  PARTAN  regains  its  bearings  due  to 
its  gradient  capabilities,  and  proceeds  to  the  optimal  solution  relatively  quickly. 
Unfortunately,  the  assumed  advantage  of  the  PARTAN  approach  —  theoretical 
convergence  in  n-1  iterations  for  a  quadratic  function  of  n  parameters  —  is  not 
realized  in  this  case.  Indeed,  any  advantages  PARTAN  has  in  this  instance  comes 
from  its  projected  gradient  component,  which  the  Projected  Gradient 
Algorithm  itself  provides  more  directly  with  better  results. 

Finally,  it  should  be  noted  that  using  the  optimal  basis  or  dual  vector  sets 
can  give  slightly  different  directional  descent  information  than  OSL  due  to  the 
likelihood  of  multiple  optimality  for  certain  values  of  x*,  which  in  turn  can  affect 
the  convergence  rate  of  the  search.  The  most  notable  example  of  this 
phenomenon  occurs  with  the  PARTAN  Algorithm,  where  the  OBS-Complete 
version  takes  two  additional  iterations  at  the  end  to  confirm  optimality. 
Additionally,  the  estimates  of  the  quadratic  fit  vary  considerably  between  the 
OSL,  OBS-Complete,  and  ODV  versions  on  the  last  iteration.  This  variation 
occurs  since  a  combination  of  short  distances  and  slightly  different  projection 
vectors  produce  different  versions  of  near-optimal  sample  values  of  Z(x). 
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Table  5.19 


PARTAN  Iterations  for  CEP1  (OSL  and  QDV)  (Scenarios  =  216,  Q  =  8) 


k 

x^,  x*\  X7,  x^ 

Est.  q* 

Act.  q 

R2 

Z(x) 

*0 

500,  500, 500,  500 

4.1 

1.0 

.999 

1,234,278 

*1 

31, 1155,  1218, 1478 

4.2 

1.0 

.999 

640,930 

PI 

0, 1181, 1250, 1526 

3.9 

1.0 

.999 

618,832 

x2 

0,  1181,  1250, 1526 

2.3 

1.0 

.999 

618,832 

P2 

0,  693,  1409,  3000 

2.3 

1.0 

.999 

374,482 

x3 

0, 693,  1409,  3000 

2.3 

1.0 

.999 

374,482 

P3 

0, 693,  1409,  3000 

2.3 

1.0 

.999 

374,482 

*4 

0, 693, 1409,  3000 

2.3 

1.0 

.999 

374,482 

P4 

0, 693, 1409,  3000 

2.4 

1.0 

.999 

374,482 

*5 

0, 693,  1409,  3000 

1.1 

1.0 

.999 

374,482 

P5 

0, 0,  2333,  3000 

1.2 

1.0 

.999 

355,160 

x6 

0, 0,  2333,  3000 

2.8 

1.0 

.999 

355,160 

P6 

0, 0,  2333,  3000 

1.1 

1.0 

.999 

355,160 

x7 

0, 0,  2333,  3000 

— 

— 

— 

355,160 

5.3.3  CEP1  Response  Surface  Analysis 

The  statistical  analysis  of  CEP1  becomes  somewhat  abbreviated  due  to  the 
nature  of  the  optimal  solution  imposed  by  the  upper  bounds  on  the  decision 
variables.  The  fundamental  assumption  made  by  this  analysis  —  the  existence  of 
multiple-optimal  or  near-optimal  solutions  within  a  'flat'  region  —  obviously  does 
not  occur  in  this  case.  Instead,  the  optimal  solution  lies  on  the  'side'  of  the 
unconstrained  region,  prevented  from  moving  towards  the  equilibrium  point  at  the 
'bottom'  by  the  bounds  imposed  on  the  decision  variables.  Therefore,  any 
movement  in  any  feasible  direction  away  from  the  optimal  solution  causes  a  steep 
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increase  in  Z(x).  By  contrast,  any  relaxation  of  the  upper  bound  constraints  on  x7 
or  x8  produces  a  considerable  decrease  in  Z(x). 

However,  this  result  does  not  preclude  describing  the  distributional  nature 
of  the  current  optimal  solution  using  tolerance  limits.  Indeed,  an  analysis  of 
several  feasible  solutions  reveals  a  very  skewed  frequency  distribution  as 
indicated  by  the  large  discrepancies  between  Z(x)  and  the  median  (Table  5.20). 
Figure  5.3  on  the  following  page  reinforces  this  description  with  a  graphical 
portrait  of  the  population  distribution  for  the  optimal  solution.  Although  Z(x) 
remains  the  best  measure  of  the  long-term  operating  costs,  the  tolerance  limit 
analysis  clearly  shows  that  occurrences  requiring  much  higher  expenditures  will 
very  likely  occur  over  the  lifetime  of  this  problem. 

Although  a  formal  response  surface  approximation  as  originally  intended 
for  these  type  of  problems  cannot  be  performed  for  CEP1,  the  insight  obtained 
can  still  be  presented  to  the  decision-maker. 


Table  5.20 

Tolerance  Limits  for  CEP1  (Random  Seed  -  221789) 


x*\  X7,  x^ 

Z(x)  Mdn.  zfc 

Population 

Min  Z£  Max 

Tol.  Limit 

Min  z k  Max  z & 

%  Cvg. 

0, 0, 2333.33,  3000 

355,160  154,013 

16,667  1,833,413 

18,542  1,593,938 

.9907 

0, 0, 2000,  3000 

408,826  269,775 

15,000  1,950,350 

17,000  1,710,350 

.9815 

0, 0, 2500,  2500 

419,729  252,432 

16,000  1,931,844 

17,875  1,691,844 

.9861 

0, 0, 2666.66,  2000 

493,034  350,800 

15,333  2,030,275 

15,333  1,790,800 

.9954 
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Figure  5.3.  Comparison  of  Tolerance  Limits  to  Population  Distribution, 
Mean,  and  Median  for  Xk  =  (0, 0,  2333.33,  3000) 

CEP1  Analysis  Summary.  The  optimal  solution  x7  =  2333.33  and 
x8  =  3000  can  be  substantially  improved  by  relaxing  the  upper  bounds  on 
either  x8  or  the  'joint'  constraint.  Any  feasible  deviation  from  the  optimal 
solution  under  current  constraints  will  considerably  increase  Z(x)  and 
most  likely  raise  the  cost  of  the  worst  case  scenario.  Finally,  scenarios 
costing  four  times  higher  than  average  are  possible. 

Both  VRT  methods  significantly  reduce  the  variance  of  the  estimators  of 
Z(x)  in  the  case  of  CEP1.  Table  5.21  shows  the  results  where,  unlike  PGP2,  the 
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CV  technique  remains  competitive  with  the  LH  approach  (CVs  use  all  three  OV  as 
controls).  In  both  PGP2  and  CEP1  the  large  skewed  characteristic  of  the 
underlying  distribution  most  likely  accounts  for  reducing  VRT  effectiveness  when 
compared  to  APL1P.  Nonetheless,  both  techniques  offer  considerable 
improvement  over  the  random  sample  estimator. 


Table  5.21 

Comparison  of  Estimator  Accuracy  and  Variance  for 


x-\  X7,  x^ 

Z(x) 

Zrs 

s2RS 

z cv 

-9  * 

szCV 

%t 

Zlh 

-9  * 

szLH 

0, 0, 2333,  3000 

355160 

361058 

3.212 

354411 

,519 

.84 

346827 

.365 

.89 

0, 0, 2000,  3000 

408826 

406321 

5.714 

397896 

.784 

.86 

407738 

.570 

.90 

0, 0, 2500,  2500 

419729 

415518 

1.689 

419349 

.431 

.75 

412513 

1.02 

.39 

0,  0, 2666,  2000 

493034 

522554 

2.180 

497606 

.203 

.91 

513481 

.287 

.87 

*  -  in  billions.  f  -  %  Variance  reduction  from  RS 


5.4  4 TERM 

5.4.1  4TERM  Problem  Description 

4TERM  models  a  vehicle  allocation  problem  between  a  central  depot  and 
four  outlying  terminals.  The  vehicles  are  single  tractors  with  a  one-  or  two-trailer 
configuration,  while  demand  constitutes  the  stochastic  right-side  elements 
modeling  daily  pick-up  and  delivery  requirements  at  each  of  the  four  terminals 
(for  a  total  of  eight  independent  right-side  random  variables).  Each  random 
variable  can  take  one  of  two  discrete  values  with  equal  probability,  thus  providing 
256  possible  demand  scenarios.  The  first-stage  decision  variables  model  the 
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allocation  —  or  basing  —  of  the  existing  fleet  of  300  trailers  and  200  tractors 
among  the  five  locations  using  the  constraints 

5 

X  x/  =  300  (Trailers)  (5.1a) 

7= l 

10 

£x/’  =  200  (Tractors)  (5.1b) 

7=6 

while  the  availability  of  single  tractor-trailer  combinations  for  daily  rent  to 
supplement  the  existing  fleet  is  represented  as 

15 

X  x7  <  10,000  (Rental  Tractor-Trailer).  (5.2) 

Ml 

The  cost  of  using  ji,j  =  11, ... ,  15  is  d  =  100,  while  the  existing  fleet's  expenses 
are  zero  under  this  model;  i.e.,  d  =  0,  j  =  1,  ...  ,  10.  Equations  (5.1)  and  (5.2) 
constitute  Ax  =  b,  and  x  >  0. 

The  T  matrix  deterministically  allocates  the  decision  variables  among  the 
four  terminals  and  central  depot  without  any  gains,  losses,  or  stochastic 
representation;  e.g.,  x1  (trailers)  and  x6  (tractors)  model  the  transport  resource 
availability  of  the  central  depot.  A  single  recourse  right-side  variable  models  each 
type  of  resource  —  tractor  or  trailer  —  separately;  thus,  x>,  j  =  1,. ..,  10  correspond 
directly  to  their  own  resource  element.  The  rental  decision  vector  also  transfers  to 
the  recourse  right-side  under  the  same  conditions,  with  the  exception  that  as  a 
tractor-trailer  package  it  adds  resources  to  two  separate  elements;  e.g.,  x11  adds 
one  trailer  and  one  tractor  to  the  central  depot  resources  supplied  by  x1  and  x6, 
respectively.  Mathematically,  the  T  matrix  can  be  expressed  as 
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-x/-x/'  +  10  =  0,  j=  1,...,  5 


(5.3a) 


-yj  -yJ  +  5  =  0,  7  =  6,...,  10.  (5.3b) 

The  recourse  model  attempts  to  retain  the  same  home-base  allocation  represented 
by  the  first-stage  vector  x;  however,  penalty  costs  of  1,000  per  tractor  or  trailer 
allow  for  complete  recourse  if  there  is  an  insufficient  number  of  vehicles  or  if 
mismatched  demand  occurs. 

5.4.2  4TERM  Optimization  Results 

Unlike  the  previous  problems,  4TERM  does  not  lend  itself  to  a 
manageable  number  of  optimal  bases  for  the  OBS-Complete  or  ODV  algorithms; 
consequently,  OBS-Reset  remains  the  only  computational  alternative  to  OSL  for 
calculating  Z(x).  In  this  case,  the  algorithm  resets  the  optimal  basis  and  dual 
vector  sets  to  zero  for  each  individual  x*,  which  in  turn  provides  a  noticeable 
reduction  in  computation  times.  However,  as  Table  5.22  shows,  the  amount  of 
reduction  does  not  approach  the  OBS-Complete/ODV  results  of  the  other 
problems  (assuming  its  availability).  Although  several  \k  require  a  unique  basis 
to  cover  each  of  the  possible  256  scenarios,  most  only  need  between  50  and  60. 

Table  5.23  presents  the  Geometric  Simplex  Algorithm  results  for 
4TERM.  Again  due  to  the  small  number  of  scenarios,  all  the  search  methods  find 
the  true  response  Z(x).  Unlike  previous  problems  the  simplex  avoids  a 
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Table  5.22 

Computation  Times  of  Selected  Options  for  Geometric  Simplex,  Projected 
Gradient,  and  PART  AN  Algorithms  for  4TERM  (in  seconds) _ 


Algorithm 

OSL 

OBS-RESET 

GEOMETRIC  SIMPLEX  (100  Iterations) 

848.59 

262.52 

PROJECTED  GRADIENT  (13  Iterations) 

125.39 

91.19 

PARTAN  (27  Iterations) 

591.60 

449.11 

premature  collapse  into  the  vertex  xev;  indeed,  the  terminating  vertex  x  ioo’s 
objective  function  value  of  42369  represents  a  substantial  improvement  over 
Z(xev).  However,  comparing  the  vertices  of  the  last  simplex  indicates  that  a  slow 
convergence  begins  in  the  latter  stages  of  the  search  similar  to  what  occurs  with 
PGP2  and  CEP1.  Furthermore,  the  simplex's  computational  time  already 
approaches  an  order  of  magnitude  higher  (OSL  option)  than  the  Projected 
Gradient  Algorithm's  while  still  somewhat  far  from  a  near-optimal  solution. 
This  persistent  tendency  of  the  simplex  to  prematurely  converge  casts  doubt  on  its 
ability  to  continue  towards  the  region  of  optimality  in  a  reasonable  amount  of 
time.  Finally,  the  Geometric  Simplex  Algorithm  tends  to  drop  the  x11,...,  x15 
values  at  a  steady  pace.  As  seen  shortly,  the  Projected  Gradient  Algorithm 
initiates  a  similar  steep  drop  in  these  decision  variables,  thus  validating  the 
simplex's  sensitivity  to  the  constraint  (5.2).  This  inclination  of  x11  through  x15  to 
drop  to  zero  early  will  essentially  eliminate  them  from  the  experimental  design  of 
4TERM. 
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Table  5.23 

Selected  Geometric  Simplex  Moves  for  4TERM 
_ (Random  Seed  =  800598600) _ 


k 

*k 

Z(x) 

Simplex  Move 

Replaces 

0 

192, 20, 50,  8,  30 

146, 10,  25, 4, 150 

0, 0, 0, 0, 0 

53313 

Initial  Vertex  (xev) 

— 

1 

71,51, 110,  35,  33 

75,  38,  22,  17, 47 

904, 1881, 1631,  398, 1829 

699702 

Expansion  xe 

16 

10 

85,  28, 128,  23,  36 

83,41,  17, 12,46 

879,  881, 612, 1025,  1216 

496764 

Expansion  xe 

16 

20 

118,  35,  80,  11,56 

102, 23,  16, 13, 46 

530, 414,  545,  506,  614 

296387 

Expansion  xe 

16 

30 

162, 25,  62,  17,  35 

123, 14, 16,  18,  30 
512,311,310,  375,  326 

218909 

Expansion  xe 

16 

40 

186,20,  46, 11,38 

124,  8,  15,  20,  33 

300, 113, 171,  118,203 

125958 

Expansion  xe 

16 

50 

199, 20,  39,  10,  32 

129,  8, 15, 21,  27 

287, 105,  85,  84, 101 

101767 

Expansion  xe 

16 

60 

201, 18,40, 11,30 

134, 7,  17,  18,24 

228,  62,  29,  41, 93 

80950 

Expansion  xe 

16 

80 

193,20,47,  8,31 

141, 10, 23,  8, 19 

50, 29,  16,  18, 29 

49609 

Expansion  xe 

16 

90 

193,20, 48,  8,31 
142,9,23, 7, 18 

37, 19,  20,  14, 17 

46221 

Expansion  xe 

16 

100 

192, 20, 49,  8,  31 

143, 10, 24, 6, 17 

42369 

Expansion  xe 

16 

15. 13,  10.  13, 14 
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Fortunately,  as  in  previous  problems  the  Projected  Gradient  Algorithm 
performs  extremely  well,  reaching  an  optimal  solution  in  13  iterations.  Indeed,  as 
Table  5.24  shows  on  the  following  page,  the  algorithm  descends  very  rapidly 
towards  the  region  of  optimality  in  the  first  seven  iterations  before  taking  another 
six  steps  to  the  optimal  solution.  Unlike  APL1P  and  PGP2,  the  unprojected 
directional  descent  vector  at  optimality  is  zero,  thus  implying  a  'flat'  region 
composed  of  multiple  optimal  solutions.  Subsequent  searches  —  specifically 
PARTAN  and  those  conducted  for  the  experimental  design  centerpoint  explained 
shortly  —  confirmed  multiple  optimality  by  finding  different  solutions  in  a  similar 
number  of  steps.  Furthermore,  the  Projected  Gradient  Algorithm,  like  all 
search  algorithms  this  research  applies  to  4TERM,  immediately  begins 
eliminating  the  rental  tractor-trailer  variables  (x11,...,  x15).  Typically,  these 
variables  drop  to  zero  well  before  Z(x)  approaches  100,000,  thus  providing  the 
basis  for  eliminating  x11,...,  x15  from  the  final  experimental  design.  In  effect,  the 
Projected  Gradient  Algorithm  not  only  finds  the  optimal  solution,  but 
performs  a  factor  screening  function  as  well. 

As  in  the  previous  problems,  the  PARTAN  Algorithm  (Table  5.25) 
converges  to  an  optimal  solution  as  well.  Following  the  same  pattern  as  before, 
the  parallel  tangent  property  does  not  appear  to  supply  any  additional  advantages 
over  using  the  projected  gradient  portion  alone.  However,  starting  from  the  same 
initial  point  as  the  Projected  Gradient  Algorithm,  the  PARTAN  Algorithm 
finds  a  different  optimal  solution,  thus  confirming  multiple  optimality  of  4TERM. 
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Table  5.24 

Selected  Projected  Gradient  Iterations  for  4TERM 
_ (Scenarios  =  256,  Q  =  8)t _ 


... 

dl...d5 

k 

X6 

...xio 

d6...d10 

Est.  q 

Act.  q 

R2 

Z(x) 

x” 

...  X15 

dH...dl5 

60, 

. . . ,  60 

0,...,0 

1 

40, 

•  ■  • ,40 

0,  ...,0 

6.8 

1.0 

1.0 

1035* 

2k, 

...,2k 

-.07,  -.07,  -.07,  -.07,  -.07 

60, 

...  ,60 

.58,  -.15,  -.15,  -.15,  -.15 

2 

40, 

• ...  40 

0,...,0 

.31 

.31 

.99 

73795 

0, 

...  >  o 

0 . 0 

135,41. 

,41,41,41 

-.01, -.01,  .03,  -.01, -.01 

3 

40, 

•  •  • ,40 

.58,  -.14,  -.14,  -.14,  -.14 

.31 

.31 

.98 

52535 

0, 

...,0 

0,...,0 

134, 41, 44, 41, 41 

-.15,  -.15,  .59,  -.15,  -.15 

4 

90,  28, 28, 28, 28 

0 . 0 

.13 

.13 

.99 

41096 

0, 

...,0 

0, ... , 0 

129,  35,  66,  35, 35 

-.07,  -.07,  -.07,  -.07,  .28 

5 

90, 28, 28, 28, 28 

-.14,  -.14,  .54,  -.14,  -.14 

.23 

.23 

.99 

38930 

0, 

... .  o 

0,...,0 

125,  32,  62,  32, 48 

.17,  -.04,  -.04,  -.04,  -.04 

6 

83,21, 

53,21,21 

0,...,0 

.05 

.05 

.99 

35723 

0, 

... ,  o 

0,...,0 

132,  30,61,  30,47 

.00,  -.00,  -.00,  -.00,  -.00 

7 

83,  21, 

53,21,21 

0,  ...,0 

.01 

.01 

.99 

35518 

0, 

...,0 

o,  ...  ,  o 

137,  32,64,21,47* 

o, . . . ,  o 

13 

83,21, 

53,21,21 

0, ...  ,0 

— 

— 

— 

35514 

0, 

... .  o 

0,  ...,0 

*  -  Optimal  solution,  t  -  *  represents  1,000. 
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Table  5.25 

Selected  PARTAN  Iterations  for  4TERM  (Scenarios  =  256,  Q  =  8) 


X1  ...  x^ 

k 

X6  ...  X10 

x' '  ...  X15 

Est.  q* 

Act.  q 

R2 

Z(x) 

x0 

60, ...  ,60 

40,  ...,40 

2k, ...  ,2 k 

6.75 

1.00 

1.0 

1035014 

X1 

60,  ...,60 

40,  ...,40 

.312 

.21 

.99 

73795 

0,  ...,0 

111,47,47, 47, 47 

pi 

40, ...  ,40 

15.18 

1.00 

.99 

52033 

o, ... ,  o 

P2 

111,46, 50,46, 46 

89, 28, 28, 28,  28 

.361 

.361 

.99 

39481 

0 . 0 

P3 

117,41,60,41,41 

98, 26, 26, 26, 26 

.306 

.306 

.98 

38864 

0,...,0 

x7 

118,40,61,40,40 

75, 18, 58, 18,  32 

.06 

.06 

.99 

36131 

0,...,0 

P18 

128,  30, 65, 30, 47 

75, 18, 59, 18, 30 

.06 

.01 

.99 

35611 

0.....0 

P23 

133,  33, 62, 27, 45 

75, 18, 59, 18,  30 

.13 

.13 

.99 

35517 

0 . 0 

x26 

139, 32, 60, 26, 43 

75, 18,  59, 18,  30 

_ 

35514 

0 . 0 
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5.4.3  4TERM  Response  Surface  Analysis 

Several  aspects  of  4TERM  pose  additional  challenges  to  fitting  a 
polynomial  approximation  to  Z(x);  specifically,  the  dimension  of  the  decision 
vector  x,  the  tractor/trailer  equality  constraints  (5.1),  and  its  multiple  optimal 
solutions.  As  indicated  already,  the  gradient  search  techniques  provide  strong 
evidence  that  x11  through  x15  simply  do  not  assume  a  role  in  the  region  of  optimal 
or  near-optimal  solutions.  If  any  doubt  persists  regarding  such  an  assumption,  a 
factor  screening  design  could  help  verify  these  results.  However,  this  research 
considers  the  evidence  from  the  gradient  methods  strong  enough  for  the  response 
analysis  to  proceed  without  considering  the  rental  tractor-trailer  decision 
variables. 

Eliminating  these  five  variables  reduces  the  remaining  number  of  variables 
under  consideration  to  ten  (x1  through  x10).  However,  the  equality  constraints 
(5.1)  do  not  allow  the  necessary  degrees  of  freedom  to  construct  a  10-variable 
central  composite  design;  indeed,  two  variables  —  one  each  from  the  tractor  and 
trailer  groupings  —  must  be  'thrown  out'  in  order  to  construct  a  CCD  design  on 
the  remaining  eight.  In  effect,  this  reduction  projects  the  true  response  onto  the 
hyperplane  defined  by  the  remaining  variables,  thus  providing  a  framework  for 
applying  'standard'  designs.  This  in  turn  suggests  making  a  priori  judgments  on 
which  projection  would  undergo  the  least  distortion,  and  could  consider  such 
items  as  constraint  coefficients,  solution  comparisons,  or  subjective  interests.  In 
4TERM,  this  research  eliminates  x1  and  x6  due  to  (1)  their  large  values;  (2) 
subjective  interest  in  the  outlying  terminals  (x1  and  x6  represent  the  central 
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terminal's  trailer  and  tractor  capacity,  respectively);  and,  (3)  their  relatively  small 
variation  among  optimal  solutions.  (Other  alternatives  not  explored  by  this 
research  for  handling  input  factor  linear  dependence  include  simplex-lattice 
designs  (Scheffe  1958)  and  special  linear  transformations  (Draper  and  Lawrence 
1965a, b;  Thompson  and  Myers  1968).  Cornell  (1973,  1979,  1981)  also  reviews 
these  approaches  with  an  accompanying  bibliography.) 

Multiple  optimal  solutions  pose  another  challenge  to  the  accuracy  of  the 
response  surface  approximation.  Preliminary  experimental  designs  varying  in 
size,  but  all  using  the  Projected  Gradient  Algorithm's  optimal  solution,  gave 
polynomial  approximations  with  either  positive  indefiniteness  (one  of  the  eight 
eigenvalues  slightly  negative)  or  inadequate  fits  (R2  values  below  .8).  Since  the 
presence  of  multiple  optimality  implies  that  the  centerpoint  can  lie  anywhere  in  a 
flat  region,  these  results  suggest  the  optimal  solution  from  the  Projected 
Gradient  Algorithm  lies  near  the  'edge'  of  the  region,  and  that  a  better  fit  can  be 
derived  using  a  more  centrally  located  optimal  solution.  Such  a  centerpoint  can 
be  found  using  a  convex  combination  of  additional  optimal  solutions  discovered 
by  the  Projected  Gradient  Algorithm  under  different  initial  starting  points. 
Table  5.26  presents  five  such  optimal  solutions;  since  x3  and  X5  represent  the  two 
solutions  furthermost  apart  (xi,  x2,  and  x3  are  fairly  close),  the  design  centerpoint 
xc  represents  an  average  of  those  two  extremes. 

The  final  experimental  design  (Table  5.27)  employs  64  runs  in  the  CCD 
portion  (a  quarter  of  the  256  full  factorial),  16  axial  points  with  coded  multipliers 
of  2,  and  one  centerpoint.  The  design  possesses  a  resolution  level  of  V ,  which 
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Table  5.26 

Derivation  of  Experimental  Design  Centerpoint  x*c 


* 

x  k 

X1 

X2 

X3 

X4 

X5 

X6 

X7 

X8 

X9 

x10 

ak 

* 

x  l 

137 

32 

64 

21 

46 

84 

21 

53 

21 

21 

.0 

* 

X  2 

138 

39 

66 

15 

42 

69 

41 

30 

40 

20 

.0 

* 

X  3 

151 

32 

60 

15 

42 

78 

19 

35 

19 

49 

.5 

* 

x  4 

137 

30 

61 

15 

57 

87 

17 

40 

29 

27 

.0 

* 

x  5 

137 

47 

61 

15 

40 

108 

24 

35 

8 

25 

.5 

* 

X  c 

144 

39.5 

60.5 

15 

41 

93 

21.5 

35 

13.5 

37 

insures  no  two-way  interaction  confounding,  and  follows  a  design  generator  from 
Lorenzen  and  Anderson  (1993).  A  full-factorial  design  for  six  variables  —  x2,  x3, 
x4,  x7,  x8,  and  x9  —  provides  the  structure  for  the  26  =  64  CCD;  then  x5's  coded 
value  is  set  by  multiplying  the  coded  values  of  x2,  x4,  x8,  and  x9,  while 
multiplying  x3,  x4,  x7,  x8,  and  x9  determines  x10. 

Table  5.28  shows  the  polynomial  approximation  based  on  the  data  in 
Table  5.27,  while  Table  5.29  reports  the  eigenvalue,  eigenvector,  and  ridge 
results.  The  regression  supplies  an  acceptable  fit  (R2  of  .90)  with  all  eigenvalues 
positive  (positive  definite  fit),  thus  assuring  a  reasonable  approximation  of  the 
projected  response.  As  before,  the  best  information  comes  from  the  canonical 
analysis,  which  in  4TERM's  case  provides  several  very  useful  insights.  First  of 
all,  variables  x5  and  x8  dominate  the  two  axes  most  sensitive  to  changes  in  the 
rotated  coordinate  system,  while  x7,  x9,  and  x10  each  heavily  contribute, 
respectively,  to  the  three  least  sensitive  axes.  Consequently,  when  examining  the 
estimated  ridge  of  maximum  response,  x5  and  x8  drop  significantly,  with  little 
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Table  5.27 

Experimental  Design  for  4TERM 


X2 

X4 

Coded  Xfr 

X5  X7 

X8 

X9 

X10 

Response 

Z(x) 

-1 

-1 

-1 

1 

-i 

-i 

-1 

-1 

41206.921875 

1 

-1 

-1 

4 

-i 

4 

-1 

4 

45604.984375 

-1 

1 

-1 

i 

-i 

4 

-I 

1 

41206.921875 

1 

1 

-1 

-i 

-i 

-i 

-1 

i 

45604.984375 

-1 

-1 

-1 

1 

i 

4 

4 

1 

41206.921875 

1 

-1 

-1 

4 

i 

-i 

-i 

i 

45604.984375 

-1 

1 

-1 

i 

1 

4 

4 

-i 

41206.921875 

1 

1 

-1 

4 

i 

4 

-i 

-i 

45604.984375 

-1 

-1 

-1 

-i 

-i 

i 

-i 

i 

45252.585938 

1 

-1 

-1 

i 

-i 

i 

-i 

i 

40854.507812 

-1 

1 

-1 

4 

-i 

i 

-i 

-i 

42080.984375 

1 

1 

-1 

i 

-i 

1 

-i 

4 

38189.906250 

-1 

-1 

-1 

-i 

i 

1 

-i 

-i 

45252.585938 

1 

-1 

-1 

i 

i 

i 

-i 

-i 

40854.507812 

-1 

1 

-1 

4 

i 

1 

-i 

1 

42080.984375 

1 

1 

-1 

1 

i 

1 

-i 

i 

38128.679688 

-1 

-1 

1 

4 

-i 

4 

-i 

1 

43439.085938 

1 

-1 

1 

i 

4 

4 

4 

1 

39038.375000 

-1 

1 

1 

4 

-i 

4 

-i 

4 

43439.085938 

1 

1 

1 

1 

-i 

-i 

4 

4 

39733.085938 

-1 

-1 

1 

4 

i 

-i 

4 

4 

43439.078125 

1 

-1 

1 

i 

i 

-i 

4 

4 

39038.375000 

-1 

1 

1 

4 

1 

-i 

4 

1 

43439.085938 

1 

1 

1 

i 

i 

-i 

4 

1 

39703.816406 

-1 

-1 

1 

i 

4 

i 

4 

4 

38685.945312 

1 

-1 

1 

-i 

4 

i 

4 

4 

43086.664062 

-1 

1 

1 

i 

4 

i 

4 

1 

35575.906250 

1 

1 

1 

-i 

4 

i 

4 

1 

39915.078125 

-1 

-1 

1 

i 

1 

i 

4 

1 

38685.945312 

1 

-1 

1 

-i 

1 

i 

4 

1 

43086.664062 

-1 

1 

1 

i 

1 

i 

4 

4 

35575.906250 

1 

1 

1 

-i 

1 

i 

4 

4 

39915.078125 

-1 

-1 

-1 

-i 

4 

-i 

1 

1 

45604.984375 

1 

-1 

-1 

i 

4 

-i 

1 

1 

41206.921875 

-1 

1 

-1 

-i 

4 

-i 

1 

4 

45604.984375 

1 

1 

-1 

i 

4 

-i 

1 

4 

41230.796875 

-1 

-1 

-1 

-i 

1 

-i 

1 

4 

45604.984375 

1 

-1 

-1 

i 

1 

-i 

1 

4 

41206.921875 

-1 

1 

-1 

-i 

1 

-i 

1 

1 

45604.984375 

1 

1 

-1 

i 

1 

-i 

1 

1 

41217.343750 

-1 

-1 

-1 

i 

4 

i 

1 

4 

40854.507812 

1 

-1 

-1 

-i 

4 

i 

1 

4 

45252.585938 

-1 

1 

-1 

i 

4 

i 

1 

1 

37682.906250 

1 

1 

-1 

-i 

4 

i 

1 

1 

42080.984375 

-1 

-1 

4 

i 

1 

i 

1 

1 

40854.507812 
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Table  5.27  —  Continued 


X2 

X3 

X4 

Coded  Xfc 

X5  X7 

X® 

X9 

xio 

Response 

Z(x) 

1 

-1 

-1 

-1 

1 

1 

1 

1 

45252.585938 

-1 

1 

-1 

1 

1 

1 

1 

-1 

37682.906250 

1 

1 

-1 

-1 

1 

1 

1 

-1 

42080.984375 

-1 

-1 

1 

1 

-1 

-1 

1 

-1 

39038.375000 

1 

-1 

1 

-1 

-1 

-1 

1 

-1 

43439.085938 

-1 

1 

1 

1 

-1 

-1 

1 

1 

39038.375000 

1 

1 

1 

-1 

-1 

-1 

1 

1 

43439.078125 

-1 

-1 

1 

1 

1 

-1 

1 

1 

39038.375000 

1 

-1 

1 

-1 

1 

-1 

1 

1 

43439.078125 

-1 

1 

1 

1 

1 

-1 

1 

-1 

39038.375000 

1 

1 

1 

-1 

1 

-1 

1 

-1 

43439.093750 

-1 

-1 

1 

-1 

-1 

1 

1 

1 

43086.671875 

1 

-1 

1 

1 

-1 

1 

1 

1 

38685.945312 

-1 

1 

1 

-1 

-1 

1 

1 

-1 

39915.078125 

1 

1 

1 

1 

-1 

1 

1 

-1 

37220.449219 

-1 

-1 

1 

-1 

1 

1 

1 

-1 

43086.664062 

1 

-1 

1 

1 

1 

1 

1 

-1 

38685.937500 

-1 

1 

1 

-1 

1 

1 

1 

1 

40003.546875 

1 

1 

1 

1 

1 

1 

1 

1 

37186.652344 

2 

0 

0 

0 

0 

0 

0 

0 

35801.640625 

-2 

0 

0 

0 

0 

0 

0 

0 

41473.578125 

0 

2 

0 

0 

0 

0 

0 

0 

35802.417969 

0 

-2 

0 

0 

0 

0 

0 

0 

42209.976562 

0 

0 

2 

0 

0 

0 

0 

0 

35516.523438 

0 

0 

-2 

0 

0 

0 

0 

0 

42065.859375 

0 

0 

0 

2 

0 

0 

0 

0 

35802.417969 

0 

0 

0 

-2 

0 

0 

0 

0 

47621.117188 

0 

0 

0 

0 

0 

0 

0 

2 

35514.363281 

0 

0 

0 

0 

0 

0 

0 

-2 

37666.898438 

0 

0 

0 

0 

2 

0 

0 

0 

35514.363281 

0 

0 

0 

0 

-2 

0 

0 

0 

38013.695312 

0 

0 

0 

0 

0 

2 

0 

0 

35514.363281 

0 

0 

0 

0 

0 

-2 

0 

0 

49649.039062 

0 

0 

0 

0 

0 

0 

2 

0 

35514.363281 

0 

0 

0 

0 

0 

0 

-2 

0 

38764.906250 

0 

0 

0 

0 

0 

0 

0 

0 

35514.363281 

Uncoded  Values 

Coded  Values 

X2 

X3 

X4 

X5 

X7 

X8 

X10 

40 

61 

15 

40 

22 

35 

13 

37 

0 

30 

51 

5 

30 

17 

25 

8 

27 

-1 

50 

71 

25 

50 

27 

45 

18 

47 

1 
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Table  5.28 

Regression  Results  for  CC  Design  in  Table  5.27  for  4TERM 


Analysis  of  Variance 


Source 

DF 

Sum  of  Squares 

Mean  Square 

R  Square 

Model 

44 

807372056 

18349365 

.9001 

Error 

36 

89585747 

2488493 

Total 

80 

896957803 

Selected  Parameter  Estimates 

Variable 

Coded  Par.  Est. 

Uncod.  Par.  Est. 

Intercept 

36182 

141783 

X3 

-1601 

-748 

X4 

-2180 

-1183 

-4407 

-1435 

X8 

-2421 

-830 

X2  *  X2 

2372 

6 

X3  •  X3 

2740 

7 

00 

X 

cn 

X 

-2975 

-7 

X4  •  X4 

2525 

25 

X5  •  x^ 

5446 

14 

x7-x7 

498 

5 

x^  •  x^ 

6316 

16 

x9.x9 

874 

9 

change  in  the  remaining  variables  (of  course,  the  eliminated  variables  x1  and  x6 
make  up  the  difference). 

Designating  the  outlying  terminals  A  (where  x2  and  x7  represent  the  basing 
of  terminal  A's  trailers  and  tractors,  respectively),  B  (x3,x8),  C  (x4,x9),  and  D 
(x5,x10),  these  results  show  that  minimal  increases  from  the  optimal  solution 
represented  by  the  centerpoint  involve  significant  reallocation  of  (1)  trailer 
resources  from  the  central  terminal  to  the  outlying  nodes  B,  C,  and  D,  and  (2) 
tractors  from  the  central  terminal  to  nodes  B  and  D.  By  contrast,  the  practical 
insight  of  the  maxima  ridge  strongly  suggests  not  reducing  the  number  of  trailers 
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Table  5.29 

A  Canonical  Analysis  of  4TERM 


Eigenvalues 

X2 

Eigenvectors 
x3  X4 

X5 

6856 

.0058 

-.3387 

.0002 

.0198 

5476 

.0626 

.0696 

.0450 

.9946 

2660 

.5040 

.3731 

.7594 

-.0932 

2346 

.7075 

.2919 

-.6344 

-.0366 

2072 

-.4913 

.8091 

-.1274 

-.0212 

864 

-.0134 

-.0374 

-.0499 

-.0021 

609 

.0058 

.0011 

-.0017 

.0019 

214 

.0011 

.0001 

-.0004 

.0004 

Eigenvalues 

x« 

X9 

xio 

6856  ~ 1 

.0002 

.9405 

.0165 

.0002 

5476 

-.0015 

.0036 

.0080 

-.0015 

2660 

-.0003 

.1319 

.0626 

-.0003 

2346 

-.0042 

.1018 

-.0081 

-.0039 

2072 

.0017 

.2944 

.0264 

.0016 

864 

.0177 

-.0308 

.9972 

.0134 

609 

.8478 

.0003 

-.0221 

.5298 

214 

-.5299 

-.0000 

-.0020 

.8480 

Estimated  Minima  Ridge 

Coded  Radius 

X2 

X3  X4 

X5 

Z(x)* 

0.0 

40.0 

61.0  15.0 

40.0 

36182 

39.7 

65.7  17.5 

46.1 

34431 

1.0 

38.8 

69.9  19.2 

47.8 

34310 

Coded  Radius 

X? 

°>< 

QO 

X 

xio 

Z(x)* 

0.0 

22.0 

35.0  13.0 

37.0 

36182 

0.5 

22.3 

38.9  13.1 

37.6 

34431 

1.0 

20.8 

41.0  12.7 

49.1 

34310 

Estimated  Maxima  Ridge 

Coded  Radius 

X2 

X3  X4 

X5 

Z(x)*  _ 

0.0 

40.0 

61.0  15.0 

40.0 

36182 

0.5 

39.5 

59.7  13.6 

31.9 

40332 

1.0 

39.0 

60.7  12.9 

24.6 

47280 

Coded  Radius 

X? 

00 

* 

x10 

Z(x)*  . _ 

0.0 

22.0 

35.0  13.0 

37.0 

36182 

0.5 

21.9 

30.0  12.9 

36.9 

40332 

1.0 

21.9 

22.9  12.8 

36.9 

47280 

*  -  Regression  estimate. 
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at  terminals  C  and  D  or  the  tractors  at  terminal  B.  Furthermore,  tests  along  the 
minimal  ridge  found  multiple  optimal  solutions  to  a  distance  of  .3  coded  radius;  at 
1.0  the  actual  near-optimal  solution  of  35796  represents  only  a  .79%  increase  over 
optimality.  These  results,  combine  with  those  of  other  known  optimal  solutions, 
provides  the  decision  maker  with  a  range  of  options  in  addition  to  the  insight  of 
the  canonical  analysis. 

Regarding  distributional  analysis,  Table  5.30  reports  the  results  for 
selected  optimal  and  near-optimal  solutions  from  the  response  surface  analysis 
and  prior  gradient  searches.  Unlike  the  previous  problems,  4TERM  exhibits  a 
relatively  stable  and  symmetric  distribution,  without  either  extremely  high-cost 
(though  relatively  rare)  scenarios,  or  detectable  parameter  or  range  changes  within 
the  region  of  optimality.  Indeed,  the  near-optimal  solution  has  the  highest-cost  z *, 
suggesting  that  tradeoffs  between  lower  maximum  costs  for  slightly  higher 
expected  values  does  not  occur  in  this  problem,  at  least  not  along  the  minima 
ridge.  Figure  5.4  gives  the  graphical  presentation  of  4TERM's  distribution  at 
optimality. 

Finally,  following  the  pattern  in  previous  problems,  the  LH  VRT  turns  in 
excellent  results,  while  CVs  produce  a  very  mixed  bag,  with  one  case  again 
showing  an  increase  —  40%  —  over  simple  random  sampling  (Table  5.31). 

The  summary  and  follow-on  analysis  of  these  results  can  take  on  several 
forms  and  emphasis,  depending  upon  the  focus  and  interests  of  the  decision¬ 
maker.  One  possibility  would  expand  the  minima  ridge  insight  with  a 
supplementary  analysis  of  convex  combinations  of  optimal  solutions,  or  perhaps 
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Table  5.30 

Tolerance  Limits  for  4TERM  (Random  Seed  =  58047800) 


Z(x) 

Mdn.  Zfc 

Population 

Min  Z£  Max  z^ 

Tol.  Limit 

Min  z£  Max  Z£ 

%  Cvg. 

144,40,61,15,40 

93,22,35,13,37 

35514 

35767 

22617 

45962 

22617 

44322 

.9844 

137,47,61,15,40 

108,24,35,8,25 

35514 

35767 

22617 

45962 

22617 

44322 

.9844 

151,32,60,15 

78,19,35,19,49 

35514 

35767 

22617 

45962 

22617 

44322 

.9844 

124,39,70,19,48 

76,21,41,13,49 

35796 

35767 

22617 

50578 

22617 

48938 

.9922 

Table  5.31 

Comparison  of  Estimator  Accuracy  and  Variance  for 
RS,  CV,  and  LH  Sampling  Techniques  for  4TERM  (/=  50,  N=  10) 


X1,  ...  ,XJ 

X6,  ...  ,  X10  $ 

Z(x) 

Zrs 

_9  * 

SZRS 

Zcv 

-9  * 

szCV 

%t 

Zlh 

-2  * 
SZLH 

144,40,61,15,40 

93,22,35,13,37 

35514 

35595 

121 

35507 

76 

.37 

35522 

20 

.83 

137,47,61,15,40 

108,24,35,8,25 

35514 

35237 

712 

35246 

536 

.25 

35515 

6 

.99 

151,32,60,15 

78,19,35,19,49 

35514 

35475 

1170 

35354 

978 

.16 

35500 

24 

.98 

137,40,63,16,44 

90,22,38,13,37 

35514 

35196 

680 

35072 

189 

.72 

35533 

11 

.98 

124,39,70,19,48 

76,21,41,13,49 

35796 

36165 

1356 

36271 

1901 

-.40 

35768 

99 

.93 

t  -  x1 1 . .  .x15  set  to  zero.  *  -  in  thousands,  t-%  Variance  reduction  from  RS  implies  increase). 

finding  additional  response  surface  approximations  using  different  optimal 
centerpoints.  The  examples  below  suggest  two  techniques  for  presenting  the 
information. 

1.  Ridge  Charts.  This  type  of  chart  (Figures  5.5  and  5.6)  captures  the  ridge 
results  for  the  decision-maker  by  plotting  the  amount  of  change  in  the 
uncoded  decision  variables  values  per  unit  change  in  the  coded  deviation 
from  the  design  centerpoint  (e.g.,  optimal  solution).  For  example,  Figure 
5.5  shows  that  if  we  wish  to  proceed  half  the  distance  away  from  the 
centerpoint  in  terms  of  the  coded  experimental  design  region,  then 
terminal  D's  tractor  allocation  would  have  to  increase  by  1,  terminal  Cs 
trailer  allocation  by  2,  etc.,  to  remain  on  the  minima  ridge.  In  essence,  this 
chart  simply  displays  in  graphic  form  the  data  from  Table  5.29. 


2.  'Rule-of -Thumb'.  This  type  of  table  (Table  5.32)  presents  simple  notional 

trend  information  for  field  use  or  operational  guidance  (i.e.,  given  a 
choice,  how  do  the  drivers  decide  where  to  park  the  trucks  overnight?). 

Using  these  techniques,  the  multi-dimensional  behavior  of  4TERM  can  be 
summarized  for  the  decision-maker  as  stated  on  the  following  page. 


Table  5.32 

Notional  Summary  of  4TERM  Sensitivity* 


Terminal  B 

Terminal  C 

Terminal  D 

Trailer 

Tractor 

Trailer 

Tractor 

Trailer  Tractor 

Near-Optimal 

Incr . 

Incr . 

Incr . 

— 

Incr.  Incr. 

Avoid 

— 

Deer. 

Deer. 

— 

Deer.  — 

*  -  Terminal  A  remains  the  same.  Central  node  balances  changes  in  Terminals  B,  C,  D. 


Unit  Radius 


Figure  5.5.  Minima  Ridge  Results  for  4TERM 
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Unit  Radius 


Figure  5.6.  Maxima  Ridge  Results  for  4TERM 


4TERM  Analysis  Summary.  Figures  5.5  and  5.6  compare  deviations 
from  the  proposed  optimal  solution  that  minimize  and  maximize, 
respectively,  increases  in  expected  cost,  while  Table  5.32  provides 
notional  guidance.  Optimal  solutions  should  also  minimize  the  highest- 
cost  scenario. 

5.5  20TERM 

5.5.1  20TERM  Problem  Description 

20TERM,  a  straightforward  extension  of  4TERM,  represents  the  largest 
stochastic  problem  this  research  investigates:  63  first-stage  decision  variables  and 
40  random  right-side  elements  (again  modeling  pick-up  and  delivery  demand  for 
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the  20  outlying  terminals).  20TERM's  first-stage  constraint  structure  Ax  =  b 
follows  the  same  pattern  as  4TERM 

21 

'ZxJ  =  600  (Trailers)  (5.4a) 

j= l 

42 

X  x-7  =  400  (Tractors)  (5.4b) 

2=22 

62 

Xx/'<  10,000  (Rental  Tractor-Trailer)  (5.4c) 

2=43 

where  the  cost  of  using  yj,j  =  43,...,  62  again  is  c/  =  100,  while  the  existing  fleet's 
expenses  are  zero;  i.e.,  d  =  0  ,j=  1,...,  42.  The  T  matrix  converts  the  x  variables 
to  the  recourse  right-side  in  4TERM's  manner  as  well.  As  in  the  case  of  4TERM, 
the  right-side  random  variables  each  can  take  one  of  two  values  with  equal 
probability;  however,  while  4TERM's  eight  variables  allow  for  256  total  possible 
scenarios,  20TERM's  40  variables  permit  over  1.0995-1012  distinct  realizations  of 
recourse  demand.  Finally,  the  size  of  20TERM's  recourse  basis  dimension 
increases  to  128  from  4TERM's  28. 

The  resulting  computational  demands  of  20TERM  require  modifying  the 
previous  response  analysis  strategy.  The  first  obvious  change  recognizes  that  the 
true  values  for  Z(x)  cannot  be  found  (in  any  practical  sense);  therefore,  both  the 
search  and  experimental  design  must  estimate  Z(x)  using  Zs(x).  However, 
exploratory  samples  of  20TERM  show  both  a  significant  level  of  variance  in 
h(x,co,T)  and  VRT  patterns  similar  to  those  seen  already,  thus  ruling  out  Zrs(x) 

A 

and  Zcv(x).  Furthermore,  both  preliminary  tests  on  20TERM  and  previous 
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problems  indicate  that  the  Geometric  Simplex  and  PARTAN  Algorithms  are 
not  up  to  the  task  of  solving  a  problem  of  this  size.  Finally,  tests  on  sample 
projected  gradients  most  often  found  a  unique  basis  for  every  sample  realization 
of  the  random  variables,  thus  rendering  the  OBS-Reset  option  ineffective. 
Consequently,  this  research  resorts  to  finding  a  near-optimal  solution  with  the 
Projected  Gradient  Algorithm  using  the  OSL  option  and  the  LH  estimator 

A 

Zlh(x)  with  a  stratification  size  of  200. 

5.5.2  20TERM  Optimization  Results 

Early  research  on  applying  the  Projected  Gradient  Algorithm  to 
20TERM  found  that,  like  the  previous  problems,  the  algorithm  tends  to  find  the 
region  of  optimality  fairly  quickly.  However,  unlike  previous  attempts, 
converging  to  a  near-optimal  solution  requires  far  more  computational  time. 
Consequently,  the  following  three  modifications  help  adapt  the  Projected 
Gradient  Algorithm  to  the  demands  of  20TERM: 

1.  Starting  Solution.  Unlike  previous  applications  of  the  Projected 
Gradient  Algorithm,  where  the  initial  solution  represents  an  equal 
allocation  of  resources  to  each  first-stage  decision  variable  xJ,  this  problem 
uses  the  optimal  solution  (xev)  of  the  expected  value  approximation,  with 
EV  =  Min{cx  +  /z(x,E[co],E[T])}  as  the  starting  point  xj  (i.e.,  xi  =  xev). 
Recalling  the  estimator  7ui(xev)  equals  the  expected  result  of  using  the 
expected  value  solution  xev,  E[ZLH(xev)]  >  E[ZLH(x*)]  >  EV.  For 
20TERM,  EV=  239,272.9  while  ZLH(xev)  =  279,674.25. 
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2.  Non-Constant  Stratification  Size.  The  LH  sampling  size  of  the  x*  vector 
that  estimates  the  descent  gradient  is  set  to  200,  whereas  the  stratification 
size  of  the  responses  fitted  by  the  quadratic  for  calculating  the  stepsize 
drops  to  50.  This  idea  assumes  that  directional  information  coming  from 
the  gradient  estimate  of  a  single  solution  should  be  more  accurate  than 
any  single  response  estimate  along  the  line  projection. 

3.  Constant  Stepsize.  As  the  algorithm  approaches  near-optimality,  the 
distance  of  the  line  segment  defined  by  the  directional  gradient,  incumbent 
solution  x*,  and  lower  bounds  x  >  0  is  such  that  its  curvature  becomes  hard 
to  detect.  Therefore,  at  this  point  the  Projected  Gradient  Algorithm 
abandons  equidistant  sampling  of  the  directional  line  segment  in  favor  of 
small  constant  stepsizes  over  a  set  number  of  gradient  iterations;  in  effect, 
resorting  to  Ermoliev's  (1988)  suggestion  of  following  small,  iterative 
search  patterns. 

Table  5.33  gives  the  results  of  the  modified  projected  gradient  search  for  a 
near  optimal  solution.  After  two  iterations  using  quadratic  estimates,  the  process 
shifts  to  using  small  constant  stepsizes  (.01-.04)  over  50-100  iterations.  The 
process  terminates  after  run  #13  indicates  little  additional  progress  being  made 
after  100  iterations.  It  should  be  noted  that  while  the  projected  gradients  show 
continued  descent  possible,  the  'leveling-off  trend  suggests  the  true  optimal 
solution  will  not  be  much  less  than  254,000;  thus,  the  best  solution  found  in  run 
#13  should  provide  an  adequate  centerpoint  (xCp)  for  fitting  a  response  surface. 
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Table  5.33 

Projected  Gradient  Results  for  20TERM 


Run# 

Rand.  Seed 

Sclr.  Type* 

#  Iter,  (k) 

CPU  Time+ 

Start  Zlh(x) 

Best  ZlhOO* 

1 

13414 

Quadr.  Est. 

37 

2473 

279674.25 

267810.03 

2 

885623 

Quadr.  Est. 

14 

450 

267573.59 

266883.09 

3 

64069848 

Const.  .01 

50 

1163 

267508.91 

266271.16 

4 

22691 

Const.  .03 

50 

1153 

266719.00 

264910.53 

5 

720760 

Const.  .04 

100 

2164 

265302.31 

264410.84 

6 

6273213 

Const.  .04 

50 

1105 

265082.19 

262471.28 

7 

931925 

Const.  .03 

50 

982 

262697.75 

261063.59 

8 

405692 

Const.  .02 

75 

1456 

262228.81 

259841.73 

9 

2555416 

Const.  .02 

75 

1452 

259614.28 

258648.66 

10 

598372248 

Const.  .03 

75 

1438 

258730.72 

257718.31 

11 

3956152 

Const.  .03 

75 

1409 

258202.91 

256692.53 

12 

4106067 

Const.  .03 

75 

1364 

257749.39 

255797.83 

13 

512255541 

Const.  .03 

100 

1836 

256513.97 

254945.70 

*  -  2  =  5  for  quadratic  estimates,  t  ■  Units  in  seconds;  total  time  5  hrs.  7.5  mins. 
+  -  Not  necessarily  k—  observation. 


(Morton's  (1994)  results  found  the  lower  and  upper  bounds  of  Z(x*)  to  be  249,747 
and  256,497,  respectively.)  Refer  to  Table  5.34  in  the  following  section  for  the 
values  of  xev  and  xcp. 

5.5.3  20TERM  Response  Surface  Analysis 

Since  20TERM's  number  of  first-stage  decision  variables  renders  their 
description  in  every  table  cumbersome,  Table  5.34  consolidates  the  values  of  xev, 
the  best  solution  found  by  the  Projected  Gradient  Algorithm  (xCp),  and  the 
factor  ranges  of  the  Plackett-Burman  screening  design.  Table  5.34  also  presents  a 
good  starting  point  for  defining  those  factors  and  associated  parameters  for  the 
experimental  design  phase  of  the  analysis. 

As  discussed  in  Section  4.3,  problems  the  size  of  20TERM  preclude  using 
full  factorial  designs;  indeed,  even  highly  fractionated  experimental  designs  for 
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Table  5.34 

Expected  Value  Approximation  (x,v),  Experimental  Design 


35.5  35  26  32  —  41  —  —  —  30.5 

+  45.5  45  36  42  —  51  —  —  —  40.5 


Tractors 


the  number  of  factors  present  in  20TERM  are  prohibitive.  Consequently, 
reducing  the  number  of  factors  based  on  subjective  interests  of  the  decision¬ 
maker,  a  priori  knowledge  or  insight  into  the  problem,  or  preliminary  screening 
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becomes  necessary.  This  research  employs  the  last  technique  by  selecting  the 
nineteen  xJ  variables  that  differ  the  most  between  xev  and  xcp  as  candidates  for  a 
Plackett-Burman  screening  design  suggested  by  Montgomery  (1984)  and  Plackett 
and  Burman  (1946).  This  criteria  implies  an  interest  in  those  factors  that  we  wish 
to  avoid  changing.  By  contrast,  screening  for  those  factors  that  change  the  least 
during  the  search  assumes  a  greater  interest  in  finding  alternative  near-optimal 
solutions.  In  this  instance,  although  36  variables  change  value  from  starting  to 
near-optimal  solutions,  the  most  significant  differ  by  5  or  more.  As  in  the  case  of 
4TERM,  variables  x1  and  x22  provide  the  necessary  degree  of  freedom  for  the 
equality  constraints;  hence,  they're  ignored.  Table  5.35  shows  the  factor  settings 
and  estimated  responses,  while  Table  5.36  provides  the  regression  analysis. 

The  coded  parameter  estimates  provide  the  basis  for  selecting  with  factors 
to  include  in  a  CCD  design.  Following  the  selection  criteria  of  using  those 
variables  that  most  influence  the  response,  the  following  1 1  variables  represent  a 
descending  order  of  first-order  significance:  x4,  x14,  x7,  x21,  x37,  x13,  x51,  x10,  x62, 
x12,  and  x2.  Although  x11  could  arguably  be  included  (its  parameter  estimate  is 
only  10  less  than  x2),  CCD  design  limitations  restrict  the  number  of  factors  to  1 1 
in  order  to  keep  the  fractional  portion  to  128  runs  and  still  retain  a  resolution  V 
level. 

The  final  fractional  portion  of  the  CCD  design  follows  a  design  generator 
suggested  by  Lorenzen  and  Anderson  (1993)  where  a  full-factorial  design  for 
seven  variables  —  x4,  x7,  x14,  x21,  x37,  x51,  x62  —  provides  the  structure  for  the 
21M  =  128  runs.  The  remaining  factors’  coded  values  are  calculated  as  follows: 
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Table  5.35 

_ Plackett-Burman  Screening  Design  for  20TERM _ 

Selected  Variables  From  Table  5.34  * _ _ _ ^lh(x) _ 

+  +_  .+  +  +  +-  +  -  +  -  -  -  +  +  272044 

_  -  +  +  +  +  -  +  -  +  -  +  +  268636 

+  .  +  +  +  +  +  +-  +  -  +  -  -  -  +  269038 

+  +  +  +  +  -  +  ”  +  “  _  271066 

+  +  -++-  -++++-  +  -  +  -  -  -  270387 

"  _  269650 

+  +  -+  +-  -+  +  +  +-  +  -  +  -  271463 

_  .  .  _++.  +  +  --  +  +  +  +  -  +  -  +  271990 

_}_  _  _  __  +  +  -+  +-  +  +  +  +-  +  -  272466 

_  _  -j-  -f-  —  -|-  -j-  —  —  -f-  -f-  H~  +  -  ~f~  274566 

+  _  +  __  _.  +  +  -++-  -++  +  +  -  269603 

_  -++-  +  +  -  -  +  +  +  +  271065 

_i_  _j_  -  +  -  -  -  +  +  +  +  -  -  +  +  +  271 042 

+  +_  +  -  +  -  -  -  -  ++  -  +  +  --  ++  272404 

+  --  +  +  -+  +-  -  +  269194 

+  -  +  -  --  -++-  +  +  ■  -  270591 

+  -  +  ~  +  -  --  +  +  -  +  +  ■  270286 

-  -  -  -++-+  +  269377 

+  +  +  +  -  +  -  +  --  +  +-  +  271671 

.  284791 

*  -  Columns  correspond  to  variables  in  Table  5.34  (from  left  to  right  in  ascending  order  of  j)  for 
those  variables  with  values  for  and  For  instance,  the  second  column  represents  x3  where 
equals  50  and '+'  equals  60.  Random  Seed  for  this  design  is  630011823. 


X2  =  X21  •  X37  •  X51  •  X62 

(5.5a) 

x10  =  x7  •  X14  •  X51  •  X62 

(5.5b) 

X12  =  X4  •  X14  •  X37  •  X62 

(5.5c) 

;4  •  X7  •  X14  •  X21  •  X37  •  X51  •  X62. 

(5.5d) 

The  next  step  of  determining  the  axial  values  and  number  of  centerpoints 
requires  special  consideration  due  to  the  variability  of  the  response  estimator 
Zlh(x).  As  related  by  Lorenzen  and  Anderson  (1993),  where  F  represents  the  size 
of  the  fractional  portion  (F  =  2K~P ),  2K  the  number  of  axial  points,  and  nK 
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Table  5.36 

Regression  Results  for  Plackett-Burman  Design  in  Table  5.35 


Analysis  of  Variance 

Source 

DF 

Sum  of  Squares 

Mean  Square  R  Square 

Model 

19 

223527868 

11764625  1.0 

Error 

0 

Total 

19 

223527868 

Selected  Parameter  Estimates 

Variable 

Coded  Par.  Est. 

Uncod.  Par.  Est. 

Intercept 

271566 

359928 

-654.63 

-130.93 

X3 

-542.70 

-108.54 

X4 

-1786.22 

-357.24 

X7 

-905.39 

-181.08 

xio 

-714.49 

-142.90 

xll 

-644.85 

-128.99 

-665.11 

-133.02 

X13 

-729.11 

-145.82 

x‘4 

-945.78 

-189.16 

x15 

-496.35 

-99.27 

-517.94 

-103.53 

X21 

-882.77 

-176.55 

X37 

-850.18 

-170.04 

x43 

-143.27 

-28.65 

X47 

-506.37 

-101.27 

x48 

-528.24 

-105.65 

x50 

-315.83 

-63.17 

X51 

-727.77 

-145.55 

x62 

-668.19 

-133.64 

the  number  of  centerpoint  replications,  a  CCD  design  becomes  orthogonal  by 
selecting  the  axial  coded  multiplier  a o  using  the  relations 


q  =  (VfT2FT7^  -  Vf)2 

(5.6a) 

a0  =  1/QfV4, 

(5.6b) 

whereas  the  axial  multiplier  aR  for  a  rotatable  design  is 
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(5.7) 


clr  =  '[F. 

Since  F  and  K  are  fixed,  it  follows  immediately  from  (5.6)  and  (5.7)  that  riR 
should  be  selected  such  that  Q  equals  4  for  a  design  to  have  both  properties. 
Applying  this  result  to  the  proposed  design  for  20TERM  gives  the  equivalent 
expression  of  finding  the  integer  hr  such  that 

tig  -  (2+  VT28)2-150,  (5.8) 

or  nK  =  27.  Plugging  nK  back  into  (5.6)  gives  the  final  design's  orthogonal  axial 
multiplier  a o  as  3.3555,  compared  to  the  rotatable  multiplier  qlr  =  3.3636. 

The  appendix  presents  the  experimental  design  results  using  the  three 
primary  estimators  ZRS(x),  2fcv(x),  and  Zlh(x).  Table  5.37  on  the  following 
pages  gives  the  regression  results  for  the  most  significant  parameters,  with  the 
linear  and  quadratic  terms  dominating  the  polynomial  approximation.  Table  5.38 
provides  the  A  canonical  analysis  results. 

Since  the  Plackett-Burman  screening  design  focuses  on  those  xh  that 
influence  the  estimated  response  ZLh(x)  the  most,  the  resulting  canonical  analysis 
provides  an  excellent  estimate  of  the  maxima  ridge.  By  contrast,  while  Table  5.38 
does  provide  a  minima  ridge  assessment,  a  better  estimate  of  the  direction  of 
minimum  sensitivity  can  be  found  by  re-accomplishing  the  preceding  steps  with  a 

/V 

screening  design  composed  of  those  xJs  that  affect  the  estimated  response  Zlh(x) 
the  least.  Therefore,  this  summary  will  forgo  a  minima  ridge  recommendation, 
and  concentrate  instead  on  characterizing  the  most  influential  components  of  x. 
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Table  5.37 

Regression  Results  for  Experimental  Design  in  Appendix  for  20TERM* 


Analysis  of  Variance 

Source 

DF 

Sum  of  Squares 

Mean  Square 

F-Ratio  R  Square 

Model 

73 

1288589185 

17651907 

127  .8903 

Error 

26 

3612523 

138943 

Total 

99 

1292201709 

Selected  Parameter  Estimates 

Variable 

Uncod.  Par.  Est. 

Std.  Error 

Cod.  Par.  Est. 

Intercept 

685077.00 

33499.00 

255528.00 

X2 

-2917.43 

540.06 

-43178.00 

X4 

-4077.62 

1012.075 

-4181.95 

X2 

-2737.28 

540.84 

-2048.05 

x10 

-3199.12 

542.45 

-3526.97 

X12 

-3140.76 

411.21 

-4720.38 

x13 

-2970.91 

410.70 

-3568.97 

X14 

-2093.84 

549.42 

-1387.97 

X21 

-3235.77 

406.38 

-5210.70 

X37 

-2060.16 

810.09 

-2774.82 

X51 

-2275.19 

811.86 

-2099.77 

x62 

-1157.21 

544.96 

-1610.20 

X2  *  X2 

32.41 

4.03 

20531.00 

X4.x4 

65.56 

14.18 

11814.00 

yJ  *  x^ 

25.08 

4.03 

15892.00 

xlO-xlO 

33.60 

4.03 

21289.00 

x12 .  x12 

23.79 

2.27 

26791.00 

X12  •  X21 

6.40 

3.19 

7213.03 

x13  .  x13 

20.62 

2.29 

23221.00 

x13  .  X21 

7.07 

3.19 

7959.89 

XW.X14 

30.64 

4.03 

19410.00 

X21  .  X21 

25.16 

2.27 

28337.00 

x37.x37 

41.43 

9.07 

11665.00 

X51  .  X51 

38.39 

9.07 

10810.00 

x62  .  x62 

17.39 

4.03 

11016.00 

*  -  Based  on  Zlh(x)  estimator. 
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Table  5.38 

A  Canonical  Analysis  of  20TERM 


Eigenvalues 

X2 

X4 

Eigenvectors 

X7  X10 

x12 

x13 

38670 

.2619 

.1484 

.2363 

.3093 

.4796 

.4061 

23945 

.0439 

.0037 

.0255 

-.0581 

.7836 

-.0340 

21508 

.2778 

.0783 

.1317 

.2816 

-.3619 

.6327 

19842 

.4229 

.0638 

.0228 

-.4375 

-.0031 

-.1586 

19503 

.1807 

.0712 

.0365 

.7517 

-.0358 

-.5735 

18848 

-.7904 

.0873 

.0169 

.1558 

.0804 

.1588 

14798 

-.1004 

.0196 

.9207 

-.1198 

-.1100 

-.1939 

12528 

.0201 

.5770 

-.0066 

-.1349 

-.0025 

-.0928 

11136 

.0214 

-.3849 

.0413 

.0061 

.0253 

.0125 

10399 

.0717 

-.4101 

-.1688 

.0888 

-.0011 

-.0485 

9597 

-.0065 

.5526 

-.2146 

.0217 

-.0672 

-.0557 

Eigenvalues 

x14 

X21 

x37 

X51 

x62 

38670 

.0061 

.5929 

.0367 

.0864 

.0119 

23945 

.0282 

-.6124 

-.0464 

.0306 

-.0002 

21508 

.2054 

-.4931 

.0533 

.0121 

.0198 

19842 

.7636 

.1177 

-.0370 

.0367 

-.0155 

19503 

.2207 

-.0667 

.0123 

-.1110 

.0091 

18848 

.5472 

.0456 

.0818 

.0549 

.0177 

14798 

-.0846 

-.0763 

-.0309 

.2525 

.0035 

12528 

-.1023 

-.0485 

.7625 

-.1071 

.1845 

11136 

.0467 

.0250 

.0706 

-.0692 

.9145 

10399 

.0252 

-.0080 

.4598 

.7444 

-.1468 

9597 

-.0728 

-.0326 

-.4320 

.5835 

.3270 

Estimated  Minima  Ridge 

Coded  Radius 

X2 

X4 

x7 

xio 

X12 

x13 

Z(x)*  _ 

0.0 

25.5 

13.5 

26.0 

27.0 

40.5 

40.0 

255528 

1.0 

28.9 

21.6 

21.4 

28.6 

41.2 

38.6 

261453 

Coded  Radius 

X14 

X21 

x37 

X51 

x62 

Z(x)* 

0.0 

31.0 

35.5 

17.0 

17.5 

28.5 

255528 

1.0 

30.6 

36.4 

18.9 

28.2 

38.5 

261453 

Estimated  Maxima  Ridge 

Coded  Radius 

X2 

X4 

x7 

X10 

x12 

x13 

_ ZfiQ! _ 

0.0 

25.5 

13.5 

26.0 

27.0 

40.5 

40.0 

255528 

1.0 

18.1 

10.9 

20.2 

19.0 

24.6 

26.7 

304841 

Coded  Radius 

x“ 

X21 

x37 

X51 

X62 

_ Z(xl! _ 

0.0 

31.0 

35.5 

17.0 

17.5 

28.5 

255528 

1.0 

30.1 

16.4 

15.7 

15.8 

27.6 

304841 

*  -  Regression  estimate. 
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In  that  context.  Table  5.38  shows  that  reducing  x2,  x7,  x10,  x12,  x13,  and  x21 

A 

from  their  xcp  values  considerably  increases  the  estimated  response  ZLh(x). 
Examining  the  eigenvectors  reveals  these  factors  as  prominent  components  in  the 
rotated  axes  with  the  highest  eigenvalues,  although  in  this  design  every  axis 
exhibits  significant  curvature  (again  reflecting  the  screening  design's  choices). 
Figure  5.7  expresses  this  phenomenon  in  graphical  terms  for  easier  understanding. 

Unlike  the  previous  problems,  20TERM  does  not  afford  the  true 
population  parameters  for  comparison  to  sample-based  estimates.  Furthermore, 
the  current  design’s  emphasis  on  influential  factors  suggests  little  likelihood  of 
finding  lower  maximum  values  of  z*  at  any  location  other  than  Xcp.  Therefore, 
this  analysis  presents  the  tolerance  limits  for  xev  and  xcp  in  Table  5.39  and  a 
histogram  of  400  random  samples  of  zk  at  xcp  in  Figure  5.8.  Both  the  tolerance 
limit  results  and  sample  distribution  suggest  20TERM  follows  a  near-symmetrical 
distribution  similar  to  4TERM.  These  results  suggest  the  following  analysis 
summary. 

20TERM  Analysis  Summary.  Suggest  using  xqp  as  defined  in  Table 
5.34.  Avoid  reducing  the  current  values  of  the  decision  variables  as 
shown  in  Figure  5.7;  however,  increases  in  these  figures  can  occur  with 
small  gains  in  Z(x).  Tolerance  limits  suggest  near-symmetrical 
distribution  with  upper  limit  approximately  $50k  over  expected  value. 
Minimal  ridge  estimation  requires  further  analysis. 
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Table  5.39 


Tolerance  Limits  for  20TERM  (Random  Seed  -  3623643) 


Tolerance  Limit 

X* 

Zrs(*a) 

Lower  Limit  zr 

Upper  Limit  zm 

283596 

239281 

339276 

XCP 

258557 

230766 

307655 

Unit  Radius 


Figure  5.7.  Maxima  Ridge  Results  for  20TERM 


Finally,  although  this  research  uses  LH  sampling  only  for  the  previous 
analysis  of  20TERM,  applying  RS  and  CV  sampling  to  (1)  the  centerpoint  portion 
of  the  experimental  design  and  (2)  the  entire  final  design  in  the  appendix  provides 
ways  to  measure  of  the  amount  of  variance  reduction  using  LH  sampling.  First, 
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226  234  242  250  258  266  274  282  290  298 

(in  thousands) 


Figure  5.8.  Sample  Distribution  of  z*  for  Centerpoint  (x*) 

using  the  27  estimators  of  the  centerpoint  in  the  same  fashion  as  the  previous 
problems  use  10  estimators  of  a  given  Z(xk)  gives  the  results  shown  in  Table  5.40. 
Since  the  sample  variance  estimators  in  Table  5.40  are  equivalent  to  the  mean 
square  pure  error  under  linear  regression,  a  natural  extension  of  such  an  analysis 
would  compare  the  regression  results  using  different  sampling  techniques  for  the 
entire  design.  Table  5.41  provides  such  a  comparison  (note  that  both  the  lack  of 
fit  and  pure  error  drops  when  using  VRT  —  especially  LHs). 


213 


Table  5.40 

Comparison  of  Estimator  Accuracy  and  Variance  for 
RS,  CV,  and  LH  Sampling  Techniques  for  20TERM  (7=50,  N=2T) 


%  Variance  Reduction J 


Zrs(xo) 

Zcv(xo) 

Zlh(xo) 

256393.22 

255784.72 

256009.81 

255674.83 

255718.84 

254763.36 

253818.95 

253197.25 

255341.89 

253555.20 

253433.41 

254785.86 

252856.48 

252810.88 

255575.00 

255490.61 

255577.50 

255235.52 

252902.50 

254253.78 

255041.83 

253241.92 

254443.30 

255877.17 

256394.80 

256144.66 

255931.16 

254077.73 

253923.22 

255685.38 

254993.14 

254751.05 

255091.02 

257465.98 

257531.95 

255311.55 

254823.16 

255137.92 

255446.63 

255125.48 

254839.91 

255863.88 

255648.98 

255481.08 

255543.17 

258436.55 

257506.67 

254989.86 

257079.67 

255383.59 

255698.77 

254280.70 

254184.72 

254915.28 

252949.27 

252760.42 

255062.73 

260193.55 

260242.17 

255693.64 

256091.58 

256139.03 

255246.78 

257106.16 

254932.30 

255796.36 

254390.89 

254333.63 

255222.17 

255581.75 

255756.41 

255611.36 

255913.77 

256291.66 

255996.63 

257371.08 

258470.84 

255658.77 

254371.97 

254671.95 

255388.92 

Zrs(xo) 

Zcv(xo) 

ZlhOo) 

255415.92 

255322.33 

255436.46 

3226848 

2883646 

138943 

_ 

.1064 

.9569 
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Table  5.41 

Regression  Results  for  Experimental  Design  in  Appendix  for  20TERM 


Residual 

DF 

Random  Sampling 

Sum  of  Squares  Mean  Square 

F-Ratio 

R  Square 

Lack  of  Fit 

73 

1441633317 

19748402 

6.12 

.8757 

Pure  Error 

26 

83898043 

3226848 

Total  Error 

99 

1525531360 

15409408 

Control  Variates 

Residual 

DF 

Sum  of  Squares 

Mean  Square 

F-Ratio 

R  Square 

Lack  of  Fit 

73 

1426942064 

19547152 

6.78 

.8787 

Pure  Error 

26 

74974798 

2883646 

Total  Error 

99 

1501916862 

15170877 

Latin  Hypercube 

Residual 

DF 

Sum  of  Squares 

Mean  Square 

F-Ratio 

R  Square 

Lack  of  Fit 

73 

1288589185 

17651907 

127.00 

.8903 

Pure  Error 

26 

3612523 

138943 

Total  Error 

99 

1292201709 

13052543 
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Chapter  6 


Conclusions 


6.1  Introduction 

This  chapter  closes  this  dissertation  by  reviewing  its  research  efforts  in  the 
following  format: 

1.  Results  and  Contributions.  Surveys  the  computational  requirements  and 
empirical  results  of  the  proposed  techniques  for  finding  x*,  deriving  a 
polynomial  approximation  of  Z(x)  as  a  function  of  x,  and  integrating  the 
underlying  distributional  characteristics  in  the  decision  process.  Also 
examines  this  dissertation's  contributions  —  and  their  significance  —  to 
the  topic  of  two-stage  stochastic  linear  programming  with  recourse. 

2.  Recommendations  for  Future  Research.  Suggests  areas  of  future  research 
based  on  the  discoveries  of  this  study. 

3.  Conclusions.  Summarizes  this  dissertation's  accomplishments. 

Each  section  follows  the  same  organizational  format  as  the  rest  of  this  dissertation 
regarding  the  topics  listed  under  Optimization  Methods  and  Statistical  Analysis. 
Figure  6.1  on  the  following  page  provides  a  summary  chart  of  this  chapter  for 
quick  reference. 
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6.2  Results  and  Contributions 
6.2.1  Optimization  Methods 

This  dissertation  investigated  several  techniques  in  two  basic  categories 
for  finding  x*  as  the  starting  point  for  conducting  response  surface  analyses  — 
search  methods  (projected  gradient,  geometric  simplex,  and  PARTAN),  and 
optimization  algorithms  (optimal  bases  and  dual  vector  sets).  Regarding  search 
methods,  the  Projected  Gradient  Algorithm  clearly  outperforms  the  other  two 
algorithms  for  all  problems  in  terms  of  computational  duration  and  convergence; 
indeed,  neither  the  Geometric  Simplex  or  PARTAN  Algorithms  can  handle 
larger  problems  (20TERM)  in  any  reasonable  amount  of  time.  The  Projected 
Gradient  Algorithm's  clearest  advantage  lies  in  its  ability  to  find  an  accurate 
directional  descent  vector;  neither  the  straightforward  Geometric  Simplex  or 
PARTAN's  parallel  tangent  property  gave  better  directional  guidance  for  the 
computational  time  either  saved  or  expended,  respectively.  Furthermore,  using 
the  quadratic  fit  of  the  response  along  the  directional  descent  for  estimating  the 
stepsize  gives  the  Projected  Gradient  Algorithm  the  capability  to  find  the 
region  of  optimality  fairly  quickly  even  for  the  largest  problems',  only  in  the 
region  of  optimality  for  those  cases  does  it  become  more  tractable  to  resort  to 
predetermined  stepsizes. 

As  for  the  OBS-Complete  and  ODV  techniques,  such  methods  provide 
clear  computational  advantages  for  smaller  recourse  problems.  The  OBS-Reset 
option  also  proves  advantageous  over  repetitive  OSL  calls  for  medium-sized 
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problems,  but  not  the  order  of  magnitude  seen  with  the  smaller  ones.  Ultimately, 
though,  this  technique  is  problem-dependent,  and  for  very  large  problems  may  not 
be  a  viable  option. 

6.2.2  Statistical  Analysis 

The  results  from  the  response  surface  approximation  of  Z(x)  establish  the 
viability  and  usefulness  of  this  form  of  analysis  for  two-stage  stochastic  linear 
programming  with  recourse.  Specifically,  the  results  and  contributions  of  this 
research  in  this  category  are  summarized  below: 

1.  Variance  Reduction.  This  dissertation  establishes  that  Latin  hypercube 
sampling  guarantees  a  reduction  in  the  variance  of  the  sample  estimator  of 
Z(x)  over  random  sampling  for  two-stage  capacity  expansion  problems, 
and  empirically  confirms  such  reductions  as  both  large  and  consistent  for 
the  set  of  test  problems.  Most  importantly,  this  variance  reduction 
technique  can  be  applied  to  any  algorithm  or  analytical  technique  that 
employs  statistical  estimation  of  the  objective  function  for  two-stage 
stochastic  linear  programming  problems  with  recourse.  By  contrast, 
using  the  random  elements  of  T  and  ©  as  control  variates  generally  does 
not  reduce  the  variance  nearly  as  well  as  the  Latin  hypercube  technique; 
indeed,  cases  exist  where  such  controls  increase  the  variance  of  the  sample 
estimator.  Furthermore,  as  a  practical  matter  using  Latin  hypercube 
sampling  demands  very  few  additional  computations,  and  —  unlike 
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control  variates  —  requires  no  knowledge  or  guesses  on  correlation  to  the 
response  /i(x,0),T). 

2.  Experimental  Design.  This  study  demonstrates  that  experimental  design 
techniques  —  such  as  preliminary  factor  screening,  fractional  design,  and 
orthogonal,  rotatable  central  composite  designs  —  can  be  successfully 
applied  to  this  class  of  problems. 

3.  Response  Surface  Analysis.  This  research  demonstrates  the  feasibility  of 
fitting  a  second-order  polynomial  to  Z(x)  in  the  region  of  optimality. 
Although  sometimes  requiring  factor  adjustments  in  range  or  centerpoints, 
all  problems  in  the  test  set  can  be  fit  with  a  positive  definite  quadratic 
form  and  R2  factor  near  .9  or  better.  Most  importantly,  the  canonical 
analysis  of  these  approximations  empirically  confirms  the  existence  of 
optimal  or  near-optimal  regions,  and  provides  a  method  of  sensitivity 
analysis  not  available  until  now. 

4.  Tolerance  Limits.  Finally,  this  dissertation  applies  the  non-parametric 
technique  of  tolerance  limits  to  characterize  the  underlying  distribution, 
and  to  incorporate  such  results  in  the  decision-making  process.  Although 
problem-dependent,  such  analysis  found  cases  in  the  current  problem  set 
where  either  pathologically  skewed  distributions  or  reduced  tolerance 
ranges  for  near-optimal  solutions  suggest  expanding  the  decision  criteria 
beyond  Min  Z(x). 
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6.3  Recommendations  for  Future  Research 
6.3.1  Optimization  Methods 

The  disappointing  results  of  the  Geometric  Simplex  (and  to  a  lesser  extent 
PARTAN)  Algorithms  offer  limited  possibilities  for  further  research  in  these 
areas;  both  their  performance  and  inherent  liabilities  in  the  stochastic  recourse 
environment  suggest  little  likelihood  of  improvement.  By  contrast,  the  Projected 
Gradient  Algorithm  proves  itself  to  be  a  viable  method  for  finding  the  optimal 
or  near-optimal  solution  for  even  the  largest  problems,  and  perhaps  can  be 
improved  upon  in  the  following  areas: 

1.  Stepsize  Estimation.  This  dissertation  shows  that  using  a  quadratic 
estimate  of  Z(x)  as  a  function  of  the  projected  gradient  multiplier  very 
quickly  finds  the  region  of  near-optimality;  however,  the  method  does  not 
always  work,  especially  as  the  search  nears  optimality.  Further 
investigations  into  different  approximation  methods  may  increase  its 
accuracy. 

2.  Adaptive  Search  Techniques.  As  implemented,  the  Projected  Gradient 
Algorithm  uses  constant  parameters  for  the  number  of  points  searched 
along  the  line  segment,  their  sampling  size,  and  the  length  of  the  line 
segment  itself.  However,  this  ’one-size-fits-all'  approach  clearly  does  not 
work  as  efficiently  in  the  region  of  optimality,  again  particularly  for  larger 
problems.  Consequently,  a  more  dynamic  approach  whereby  the 
algorithm  adjusts  the  search  process  in  order  to  gain  more  precise 
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information  in  the  immediate  area  of  the  incumbent  solution  should 
provide  better  results. 

Regarding  optimal  basis  sets,  their  use  clearly  provides  computational 
advantages  whenever  the  specific  problem  makes  them  available.  Unfortunately, 
this  research  strongly  suggests  that  the  OBS-Complete  technique  will  not  be 
practical  for  larger  problems.  However,  this  study  does  not  fully  explore  cases 
where  the  OBS-Reset  method  might  prove  practical  for  even  larger  problems. 
These  extensions  include: 

1 .  Expanded  Reset  Option.  The  current  algorithm  resets  the  optimal  basis  set 
for  each  feasible  x^,  regardless  of  the  specifics  of  the  algorithm  in  use; 
however,  another  feasible  x^+i  'nearby'  may  share  a  significant  number  of 
optimal  bases.  For  instance,  the  proposed  reduced  line  segment  in  a 
revised  Projected  Gradient  Algorithm  may  require  a  reasonably  small 
number  of  optimal  bases  along  its  entire  length;  in  such  a  case,  the  short 
segments  of  a  problem  like  20TERM  in  its  region  of  optimality  could  be 
estimated  more  quickly.  This  same  phenomenon  might  occur  in 
experimental  design  settings  as  well.  Obviously,  repetitive  sampling  of  a 
single  point  (such  as  the  centerpoint)  would  benefit;  however,  with  the 
reduced  number  of  factors  common  to  fractional  designs,  a  single  optimal 
basis  set  might  still  be  practical.  Furthermore,  such  a  case  would  allow 
replications  at  all  design  points  (not  just  the  centerpoints),  providing  even 
better  estimates  of  experimental  error. 
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2. 


Dynamic  Reset.  Another  suggestion  would  be  to  'sort-and-trim'  the 
optimal  basis  set  as  an  integral  part  of  the  iterations  of  any  particular 
algorithm.  Based  on  frequency  of  optimality,  as  less-used  bases 
progressively  move  to  the  bottom  of  the  list  they  would  be  replaced  by 
newer,  more  frequently  used  optimal  bases  in  an  on-going  process. 

6.3.2  Statistical  Analysis 

Just  as  this  dissertation's  principal  contributions  lie  in  applying  statistical 
analysis  techniques  —  variance  reduction,  response  surface  analysis,  and  non- 
parametric  statistics  —  to  the  recourse  problem,  so  do  the  most  interesting 
avenues  for  further  research.  Specifically,  these  include  the  following 
suggestions: 

1.  Variance  Reduction.  While  the  Latin  hypercube  technique  substantially 
lowers  the  variance  of  the  estimators  of  Z(x),  even  further  variance 
reduction  may  be  possible  through  its  combined  use  with  other  VRTs. 
One  particularly  promising  prospect  involves  using  a  single  control  variate 
proposed  by  Morton  (1995b). 

2.  Response  Surface  Analysis.  This  research  employs  only  basic 
experimental  designs  and  response  surface  techniques  to  describe  Z(x). 
Additional  areas  of  research  include  using  minimum  bias  designs, 
experimental  design  structures  other  than  central  composite  designs,  and 
preliminary  factor  sampling  (Morris  1991).  Furthermore,  additional 
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polynomial  approximations  to  responses  other  than  Z(x)  may  prove  useful 
as  well. 

3.  Distributional  Analysis.  Additional  non-parametric  analysis,  such  as 
quantile,  median,  and  skewness  estimates,  would  further  characterize  the 
underlying  distribution  of  /i(x,co,T).  As  with  tolerance  limits,  such 
information  would  provide  additional  insight  to  the  decision-maker. 

6.4  Conclusions 

Since  its  introduction  40  years  ago,  researchers  have  devoted  considerable 
theoretical  and  empirical  research  into  understanding  and  solving  two-stage 
stochastic  linear  programming  with  recourse.  During  this  same  period  simulation 
—  including  the  related  fields  of  experimental  design,  variance  reduction,  and 
response  surface  methodology  —  developed  into  a  powerful  method  of  analysis 
for  problems  inherently  stochastic  in  nature.  This  dissertation  represents  a  formal 
synthesis  of  these  two  fields  —  an  investigation  in  how  to  apply  the  methods  of 
one  to  get  answers  and  insight  about  the  other.  In  so  doing  it  brings  a  new 
philosophy  to  solving  an  old  problem  while  opening  additional  avenues  of 
research.  It  accomplishes  this  from  a  tactical  point-of-view  by  providing  new 
techniques  for  efficiently  and  accurately  solving  the  classic  optimization  problem. 
Most  importantly,  from  a  strategic  perspective  this  research  introduces  the  equally 
important  topics  of  sensitivity  and  distributional  analysis  by  demonstrating  their 
viability  with  respect  to  this  class  of  stochastic  linear  programming  problems. 
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Appendix 


Table  A.  1 

Experimental  Design  for  20TERM+ 


X2  x' 

*,x7 

X21, 

)X10 

,  X12 

,xl3 

X1* 

X37, 

X51, 

x62 

ZRS(Xjfc) 

Zcv(xyt) 

ZLH(xjt) 

1 

4 

4 

1 

1 

-1 

-i 

4 

-i 

-i 

-i 

277197.81 

277058.50 

276014.53 

1 

1 

-i 

1 

4 

1 

-i 

4 

-i 

-i 

-i 

275254.03 

275498.84 

272648.09 

1 

-i 

i 

-1 

i 

1 

-i 

-i 

-i 

-i 

-i 

272360.13 

272067.63 

273361.38 

1 

1 

i 

-1 

-i 

-1 

-i 

-i 

-i 

4 

-i 

279219.50 

278849.44 

277337.97 

1 

4 

-i 

-1 

-i 

1 

i 

-i 

-i 

-i 

-i 

280607.00 

280422.78 

279328.13 

1 

1 

-i 

-1 

i 

-1 

1 

-i 

4 

-i 

-i 

276574.34 

275890.34 

275684.44 

1 

4 

i 

1 

4 

4 

1 

-i 

4 

-i 

4 

279598.56 

278392.69 

277554.34 

1 

i 

i 

1 

i 

1 

i 

4 

-i 

4 

-i 

274301.03 

275076.59 

272743.44 

-1 

4 

-i 

1 

i 

i 

4 

i 

4 

-i 

4 

270083.28 

270175.94 

273545.63 

-1 

i 

-i 

1 

-i 

-i 

-i 

i 

-i 

-i 

-i 

272570.59 

273130.00 

275833.47 

-1 

4 

1 

-1 

1 

-i 

4 

i 

4 

-i 

4 

277644.00 

277744.44 

275453.75 

-1 

i 

i 

4 

4 

1 

-i 

1 

4 

-i 

-i 

273914.78 

273089.50 

272481.97 

-1 

-i 

4 

4 

-i 

4 

i 

1 

-i 

4 

-i 

282453.75 

282164.84 

282576.19 

-1 

1 

-i 

-i 

i 

i 

1 

i 

4 

-i 

-i 

272598.69 

272831.00 

272052.38 

-1 

-i 

i 

i 

-i 

i 

i 

i 

-i 

-i 

-i 

275134.81 

276479.47 

274012.63 

-1 

1 

i 

1 

i 

4 

i 

1 

-i 

4 

-i 

272995.66 

272986.84 

272015.53 

-1 

4 

4 

1 

4 

1 

4 

4 

1 

4 

4 

279808.19 

279251.34 

278463.38 

-1 

1 

4 

1 

1 

4 

4 

4 

1 

4 

4 

276978.41 

277362.06 

274477.59 

-1 

4 

1 

-i 

4 

4 

4 

-i 

1 

-i 

4 

280626.34 

281349.03 

283072.91 

-1 

1 

i 

-i 

i 

1 

-i 

4 

1 

4 

4 

273965.31 

273058.50 

272286.69 

-1 

4 

-i 

4 

i 

4 

i 

4 

1 

4 

4 

284488.38 

284249.03 

281871.84 

-1 

1 

-i 

4 

-i 

1 

i 

4 

1 

4 

4 

276674.31 

276447.94 

278073.81 

-1 

4 

i 

1 

i 

1 

i 

4 

1 

4 

4 

274631.94 

274242.44 

272161.03 

-1 

1 

i 

1 

-i 

4 

i 

4 

1 

4 

4 

277075.63 

276338.91 

276190.97 

1 

4 

-i 

1 

-i 

4 

-i 

1 

1 

4 

4 

273376.38 

273977.06 

274504.50 

1 

1 

-i 

1 

i 

1 

-i 

1 

1 

4 

4 

272160.13 

272903.66 

273301.03 

1 

4 

i 

4 

-i 

1 

-i 

1 

1 

4 

4 

269848.13 

268495.13 

272287.09 

1 

1 

i 

4 

i 

4 

-i 

1 

1 

4 

4 

269188.69 

267365.28 

269757.84 

1 

4 

-i 

4 

i 

1 

i 

1 

1 

4 

4 

272380.59 

272371.56 

270562.81 

1 

1 

-i 

4 

-i 

4 

i 

1 

1 

4 

4 

274011.84 

272677.84 

273875.00 

1 

4 

i 

1 

i 

4 

i 

1 

1 

4 

4 

272288.75 

273286.66 

271861.19 

1 

1 

i 

1 

-i 

1 

i 

1 

1 

4 

4 

271703.63 

272262.34 

273023.16 

-1 

4 

-i 

4 

i 

1 

-i 

4 

4 

1 

4 

278668.09 

278596.19 

277745.78 

-1 

1 

-i 

4 

-i 

4 

-i 

4 

4 

1 

4 

281917.41 

282506.78 

282057.44 

-1 

4 

i 

1 

i 

4 

-i 

4 

4 

1 

4 

272411.94 

272246.94 

275982.03 

-1 

1 

i 

1 

-i 

1 

-i 

4 

4 

1 

4 

271475.09 

271670.84 

272828.09 

-1 

4 

-i 

1 

-i 

4 

i 

4 

4 

1 

4 

282052.63 

281011.19 

282387.00 

-1 

1 

-i 

1 

i 

1 

i 

4 

4 

1 

4 

271320.00 

270516.91 

271322.16 

-1 

4 

i 

4 

-i 

1 

i 

4 

4 

1 

4 

276208.63 

276268.44 

279483.06 

4 

1 

i 

4 

i 

4 

i 

4 

4 

1 

4 

277279.38 

278598.91 

275363.31 

1 

4 

-i 

4 

i 

4 

-i 

1 

4 

1 

4 

273663.25 

273341.53 

273864.41 

1 

1 

-i 

4 

-i 

1 

-i 

1 

4 

1 

4 

273276.91 

273578.56 

270851.00 
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Table  A.l  —  Continued 


X2,  X4,  X7,  X10,  X12,  X13 
X14,  X21,  X37,  x51,x62 


1 

-1 

1 

1 

1 

1  -1 

i  i 

4 

i 

1 

1 

1 

1 

-1 

-1  -1 

[  i 

-i 

i 

1 

-1 

-1 

1 

-1 

1  1 

i 

4 

1 

1 

1 

-1 

1 

1 

-1  1 

i 

-i 

i 

1 

-1 

1 

-1 

-1 

-1  1 

i 

4 

i 

1 

1 

1 

-1 

1 

1  1 

1 

-i 

i 

1 

-1 

-1 

-1 

-1 

-1  -1  -1 

1 

1 

1 

1 

-1 

-1 

1 

1  -1  -1 

1 

i 

1 

-1 

1 

1 

-1 

1  -1  -1 

i 

i 

1 

1 

1 

1 

1 

-1  -1  -1 

i 

i 

1 

-1 

-1 

1 

1 

1  1 

i  -i 

i 

i 

1 

1 

-1 

1 

-1 

-1  1  -1 

i 

i 

1 

-1 

1 

-1 

1 

-1  1  -1 

i 

i 

1 

1 

1 

-1 

-1 

1  1  -1 

i 

i 

-1 

-1 

-1 

-1 

-1 

1  - 

L  1 

1 

1 

-1 

1 

-1 

-1 

1 

-1  - 

L  1 

1 

1 

-1 

-1 

1 

1 

-1 

4  - 

i  i 

i 

1 

-1 

1 

1 

1 

1 

1  - 

i  i 

1 

i 

-1 

-1 

-1 

1 

1 

-i 

i  i 

i 

i 

-1 

1 

-1 

1 

-1 

i 

i  i 

i 

i 

-1 

-1 

1 

-1 

1 

i 

L  1 

i 

1 

-1 

1 

1 

-1 

-1 

-i 

L  1 

i 

1 

-1 

-1 

-1 

-1 

-1 

i  - 

1  -1 

-i 

4 

-1 

1 

-1 

-1 

1 

-i  - 

1  -1 

4 

-i 

-1 

-1 

1 

1 

-1 

-1  - 

1  -1 

4 

-i 

-1 

1 

1 

1 

1 

i  - 

1  -1 

4 

-i 

-1 

-1 

-1 

1 

1 

-1 

l  -1 

4 

-i 

-1 

1 

-1 

1 

-1 

i 

1  -1 

4 

-i 

-1 

-1 

1 

-1 

1 

i 

1  -1 

4 

-i 

-1 

1 

1 

-1 

-1 

-1 

1  -1 

4 

-i 

1 

-1 

-1 

-1 

-1 

-i  - 

1  1 

4 

-i 

1 

1 

-1 

-1 

1 

i  - 

1  1 

4 

-i 

1 

-1 

1 

1 

-1 

1  - 

1  1 

4 

-i 

1 

1 

1 

1 

1 

-1  - 

1  1 

4 

-i 

1 

-1 

-1 

1 

1 

1 

1  1 

4 

-i 

1 

1 

-1 

1 

-1 

-i 

1  1 

4 

-i 

1 

-1 

1 

-1 

1 

4 

1  1 

4 

-i 

1 

1 

1 

-1 

-1 

1 

1  1 

4 

-i 

1 

-1 

-1 

-1 

1 

4  - 

1  -1 

1 

-i 

1 

1 

-1 

-1 

-1 

1  - 

1  -1 

1 

-i 

1 

-1 

1 

1 

1 

1  - 

1  -1 

1 

-i 

1 

1 

1 

1 

-1 

4  - 

1  -1 

1 

-i 

1 

-1 

-1 

1 

-1 

1 

1  -1 

1 

-i 

1 

1 

-1 

1 

1 

4 

1  -1 

1 

-i 

1 

-1 

1 

-1 

-1 

4 

1  -1 

1 

-i 

1 

1 

1 

-1 

1 

1 

1  -1 

1 

-i 

-1 

-1 

-1 

-1 

1 

1  - 

1  1 

1 

-i 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 


ZRs(xfc)  Zr.v(xfc)  ZLH(*fc) 


277663.72 

269996.59 

273338.19 
270203.00 

271198.97 

278212.16 

282820.13 

267157.88 

271664.19 

271524.50 

273551.25 

275612.81 

270732.84 

272924.81 

276751.88 

272564.56 

274431.34 

277532.47 

273819.63 

267480.81 
272635.06 
273861.03 

280768.50 

276869.47 
280619.06 

271124.44 

275369.59 

273584.34 

271785.16 

279768.41 

277305.41 

276261.97 

276775.63 
274530.00 

272895.50 
272288.00 

273157.72 

270103.81 

276202.16 

275069.81 

269949.88 

276457.81 
274575.00 

270072.69 

280422.13 
268220.09 

271615.97 


281379.84 

271297.16 

272963.75 

273442.84 

270923.31 

278307.97 

282816.97 
266390.00 

272222.56 

271403.19 

273914.41 

275430.56 
270923.06 

273094.56 

275863.72 

272284.56 

273284.44 

277129.44 

274121.69 

267363.72 
273049.06 

274057.19 

281276.47 

276409.56 

280726.59 
272042.03 
276431.94 
273295.66 
271603.53 

279507.44 
277003.03 
276185.06 

277764.69 
274086.09 
272417.06 

271969.41 

272729.75 

270765.25 

277332.16 
274744.00 

271796.31 

276699.44 

274601.72 
271580.91 

280294.19 

267690.31 
271024.00 


277887.09 

270683.16 

271529.03 

270532.38 

274892.81 

275093.44 

281126.63 

269838.88 
271625.06 

268896.81 

270289.19 

274236.25 

274300.59 

271315.56 
276010.91 

272252.25 

274364.41 
274888.78 

272788.19 

270872.63 

270784.69 

273490.38 

282136.56 

278486.56 
280595.09 

271577.72 

278664.19 

275431.22 

275384.75 

280090.22 

278140.50 

270848.19 

273649.69 

273728.31 

273177.44 

271461.47 

271950.41 
270262.00 

277291.84 
273809.66 

271299.13 

272069.50 

274165.84 

271040.44 

278913.28 
269662.53 

272807.28 
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Table  A.l  —  Continued 


x^1, 

X37, 

X^, 

x62 

-1 

1 

-1 

-1 

-i 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

1 

-1 

-1 

1 

1 

-1 

1 

1 

1 

-1 

1 

-1 

1 

1 

-1 

-1 

-1 

1 

-1 

-1 

1 

1 

1 

-1 

1 

-1 

1 

1 

1 

1 

1 

1 

-1 

-1 

1 

-1 

-i 

1 

1 

1 

1 

-1 

1 

1 

-1 

1 

-1 

1 

1 

1 

1 

-1 

-1 

1 

-i 

-1 

-1 

-1 

-1 

1 

1 

-1 

1 

i 

1 

-1 

-1 

-1 

1 

-1 

1 

-1 

-i 

1 

-1 

-1 

-1 

1 

1 

1 

-1 

1 

-1 

-1 

-1 

-1 

1 

-1 

-1 

-1 

i 

1 

1 

-1 

-1 

1 

1 

-1 
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1 

1 
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1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 
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ZRs(xfc)  Zcv(xfe)  ZLH(xfc) 


275685.03 

268890.28 

272479.72 

274211.63 

267583.94 

273458.56 

271418.94 
278552.06 
270402.06 

271874.88 
270349.00 

274688.63 

278903.19 
274149.09 
268679.03 
275773.53 

270014.44 
277562.13 

270610.75 

281698.16 

274660.88 

275039.22 

268650.34 

272377.22 

278669.19 

279659.63 

274125.28 
283753.78 
272916.91 

270736.72 

271719.44 

272845.72 
268399.06 

269695.34 
268000.31 
269832.47 

269555.75 

267142.34 

280877.28 

262547.16 

280458.16 

260381.20 

268911.56 
263901.41 
273256.19 
266695.69 

279600.22 


275554.16 
269699.00 

272408.91 

273738.31 

267217.88 

272788.25 

271980.63 
278902.38 

270277.66 

272181.34 

269449.66 

274838.94 

278913.84 
273514.03 

267704.84 
276179.81 

270289.34 
277240.97 

272090.84 

281097.84 
273686.75 
278370.06 
269092.06 

271524.91 

279366.28 

279290.28 

273539.69 

283152.31 

272017.19 
270587.56 

272061.16 

271514.84 

267801.91 
270385.41 

268926.25 

270567.69 

269303.91 
267987.06 

280925.84 

262891.19 

281605.63 

262270.34 

268994.94 

266615.88 

272627.66 

265269.19 

279559.66 


276729.00 

272272.69 

272452.13 

277067.81 
270455.41 

273689.88 

271238.47 

278077.56 
269766.91 

275105.56 
271082.63 

272888.22 
277762.34 

272103.28 
270237.09 
273628.97 
270501.03 
277561.00 
270555.66 
276109.38 

273048.47 
273290.31 
271847.94 

272687.69 

276824.19 

277272.72 
273429.53 

283831.47 
271852.59 

274074.19 

271237.88 

269328.56 

268868.69 
272674.16 

269682.56 

272308.13 

268693.81 

270404.22 
280150.44 

262254.28 

280449.28 

257491.28 

267776.72 

262171.47 
271253.25 
262282.09 
281936.09 
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X 

;2,  x4,  x7 

)X10 

,xl2 

,  X13 

K14 

X21, 

X37, 

X51, 

x62 

ZRS(Xfc) 

Z cv(xfc) 

ZLH(xfc) 

0 

0 

0 

0 

a 

0 

0 

0 

0 

0 

0 

262767.47 

262977.56 

266524.84 

0 

0 

0 

0 

-a 

0 

0 

0 

0 

0 

0 

289449.34 

289624.91 

288697.25 

0 

0 

0 

0 

0 

a 

0 

0 

0 

0 

0 

265740.59 

264797.31 

267964.00 

0 

0 

0 

0 

0 

-a 

0 

0 

0 

0 

0 

277653.84 

278618.81 

280118.75 

0 

0 

0 

0 

0 

0 

a 

0 

0 

0 

0 

264566.13 

265778.88 

261836.02 

0 

0 

0 

0 

0 

0 

-a 

0 

0 

0 

0 

281942.59 

281789.44 

278625.28 

0 

0 

0 

0 

0 

0 

0 

a 

0 

0 

0 

267259.78 

267273.06 

265554.47 

0 

0 

0 

0 

0 

0 

0 

-a 

0 

0 

0 

293418.78 

293111.38 

292761.38 

0 

0 

0 

0 

0 

0 

0 

0 

a 

0 

0 

254724.94 

254350.55 

254842.89 

0 

0 

0 

0 

0 

0 

0 

0 

-a 

0 

0 

267867.94 

268843.72 

270128.69 

0 

0 

0 

0 

0 

0 

0 

0 

0 

a 

0 

259339.08 

259064.28 

256909.66 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-a 

0 

264690.44 

264783.06 

266351.53 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

a 

256759.03 

256634.92 

257119.61 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-a 

263405.75 

264139.34 

266553.50 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

256393.22 

255784.72 

256009.81 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

255674.83 

255718.84 

254763.36 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

253818.95 

253197.25 

255341.89 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

253555.20 

253433.41 

254785.86 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

252856.48 

252810.88 

255575.00 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

255490.61 

255577.50 

255235.52 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

252902.50 

254253.78 

255041.83 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

253241.92 

254443.30 

255877.17 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

256394.80 

256144.66 

255931.16 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

254077.73 

253923.22 

255685.38 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

254993.14 

254751.05 

255091.02 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

257465.98 

257531.95 

255311.55 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

254823.16 

255137.92 

255446.63 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

255125.48 

254839.91 

255863.88 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

255648.98 

255481.08 

255543.17 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

258436.55 

257506.67 

254989.86 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

257079.67 

255383.59 

255698.77 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

254280.70 

254184.72 

254915.28 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

252949.27 

252760.42 

255062.73 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

260193.55 

260242.17 

255693.64 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

256091.58 

256139.03 

255246.78 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

257106.16 

254932.30 

255796.36 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

254390.89 

254333.63 

255222.17 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

255581.75 

255756.41 

255611.36 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

255913.77 

256291.66 

255996.63 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

257371.08 

258470.84 

255658.77 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

254371.97 

254671.95 

255388.92 

t  -  'O'  codes  represent  centerpoint  values  from  Table  5.34.  Half-ranges  are  7.5  for  x2,  x7,  x10,  x14, 
and  x62;  10  for  x12,  x13,  and  x21;  and,  5  for  x4,  x37,  and  x51.  All  +a  values  are  +3.3555. 
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